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© The present invention provide a gene present in a commonly deleted region of a chromosome in breast and 
ovarian cancers and encoding a novel protein, the protein ("MDC protein") encoded by the gene, a method for 
the diagnosis of cancer by using an antibody combinable to the protein, and others. 

A detailed genetic map of human chromosome 17 was constructed to analyze the chromosome in breast 
and ovarian cancer tissues, and a gene encoding a novel protein was cloned and its structure was determined. 
As a result of gene analysis using DNA probes derived from the gene, a gene mutation was confirmed in breast 
cancer tissues. Moreover, a transformant carrying a plasmid containing the gene was grown to obtain the MDC 
protein. Furthermore, a monoclonal antibody was prepared by using the protein as antigen. 



BNSOOCID: <EP 0633268A2> 



Rank Xerox (UK) Business Services 

(3. 10/3.09/3.3. it 




EP 0 633 268 A2 



Background of the Invention 
Field of the Invention 

5 The present invention relates to MDC proteins, DNAs encoding the same, and gene analysis methods 

using the DNAs. The present invention can be utilized in such fields as medical treatment and diagnosis. 

Description of the Related Art 

70 The opinion that mutations in cellular proteins play an important role in the onset of cancer has been 

known for long. Recent advancement in genetic engineering enables analysis of gene mutations in tumor 
cells, and has brought about a marked progress in the field of cancer research. 

Up to this time, the analysis and identification of oncogenes have made such progress that the number 
thereof has amounted to several tens. On the other hand, attention has been focused on tumor suppressor 

75 genes for these several years. The tumor suppressor genes which have been discovered thus far include 
the Rb gene for retinoblastoma (Friend, S.H. et al., Proc. Natl. Acad. Sci. USA, 84, 9095, 1987), the p53 
gene (Lane, D.P. et al.. Nature, 278, 261, 1979) and the APC gene (Kenneth, W.K. et al., Science, 253, 661, 
1991) for colorectal tumor, the WT1 gene for Wilms' tumor (Call, K.M. et al., Cell, 60, 509, 1990), and the 
like. In the case of the p53 gene, some families are known to be inheriting mutations in the gene ["Li* 

20 Fraumeni syndrome" (Makin, D. et al., Science, 250, 1233, 1990; Srivastava, S. et al., Nature, 348, 747, 

1990) ]. Moreover, it is becoming increasingly clear that defects in multiple genes, and not in a single gene, 
contribute to the progression of the malignant phenotype of cancer, and it is believed that there exist much 
more unidentified oncogenes and tumor suppressor genes. The discovery and elucidation of them are 
expected by not only investigators and clinicians, but also common people in all the world. 

25 Breast cancer is classified into hereditary (familial) breast cancer and nonhereditary (sporadic) breast 

cancer, and hereditary breast cancer is classified into early-onset and late-onset diseases according to the 
age of onset. It has been revealed by linkage analyses that, at least early-onset familial, breast cancer 
linked to a very small region on chromosome 17 (Hall, J.M. et al., Science, 250, 1684-1689, 1990). 
Moreover, it has been shown that hereditary ovarian cancer is also linked to the same region (Narod, S.A. et 

30 al., Lancet, 338, 82-83, 1991). 

Accordingly, it is believed that a tumor suppressor gene is present in this region and protein deficiency 
or mutation induced by an allelic deletion or mutation of the gene is one of the causes of breast and ovarian 
cancers. 

It is believed that in the onset of common (sporadic) breast cancer as well, the occurence of an 
35 acquired mutation or allelic deletion of the gene in this region results in protein mutation or deficiency and 
this causes the transformation of a normal cell to a breast cancer (Sato et al., Cancer Res., 51, 5794-5799, 

1991) . Consequently, isolation of the causative gene present in this region and identification of the protein 
encoded by the gene are expected as an urgent problem to not only physicians and investigators in all the 
world, but also common people, particularly women in Europe and America where there are numerous 

40 patients with breast cancer. 

The present invention provides novel proteins involved in breast and ovarian cancers, DNAs encoding 
them, and methods for the testing and diagnosis of cancer by using them. 

The present inventors disclose a novel gene encoding a 524-amino acid protein which was isolated 
from chromosomal region 17q21.3 where a tumor suppressor gene(s) for breast and ovarian cancers is 
45 thought to be present (Nature genetics, 5, 151-157, 1993; this paper is refered in Nature genetics 5 No 2 
101-102, 1993). 

Disclosure of the Invention 

so Brief Description of the Drawings 

Fig. 1 is a diagram showing the positions on chromosome 17 to which 342 cosmid clones hybridize. 
Clone names are designated by clone numbers alone. 

Fig. 2 is a diagram showing partial deletions on chromosome 17q in ovarian cancers. Solid circles 
55 represent the loss of heterozygosity (LOH) and open circles represent the retention of both alleles. Two 
commonly deleted regions are designated by sidelines. 

Fig. 3 is a diagram showing partial deletions on chromosome 17q in breast cancers. Solid circles 
represent the loss of heterozygosity (LOH) and open circles represent the retention of both alleles. Two 
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commonly deleted regions are designated by sidelines. 

Fig. 4 is a diagram showing the process starting with markers on chromosome 17q21.3 and leading to 
the isolation of the gene, as well as the regions where genomic rearrangements occurred in tumor tissues 
(hatched boxes). Clone names are designated by clone numbers alone, 
5 Figs. 5-7 are diagrams showing the detection of genomic rearrangements in breast cancers by 

Southern-blot analysis. Symbols N and T represent DNAs from normal tissue and tumor tissue, respec- 
tively. 

Fig. 8 is a graph showing a working curve for determining the concentration of the MDC protein by 
ELISA using a monoclonal antibody and a rabbit polyclonal antibody. 

w 

Summary of the Invention 

The present inventors constructed a multitude of cosmid clones having DNA fragments of human 
chromosome 17 introduced thereinto. Then, each of the multitude of cosmid clones was localized 
15 throughout the chromosome by fluorescent in-situ hybridization (FISH; Inazawa et al., Genomics, 10, 1075- 
1078, 1991). The cosmid clones (cosmid markers), localized on the chromosome, enabled construction of a 
high-resolution physical map of human chromosome 17. The clone names of the ccsrnids as probes, i.e., 
the probe names, their detailed map positions and diagrammatical summary of the mapping are shown in 
Tables 1-3 and Fig. 1, respectively. In Fig. 1, clone names are designated by clone numbers alone. 
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D17S563 


I7q25. 2-q25. 3 
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Table 3 



No. 



Probe 
naoe 



Locus 
symbol 



Chroaosoaa 1 

I oca I i za t i on 



10 



15 



20 



25 



30 



35 



263 

264 

265 

266 

267 

268 

269 

270 

271 

272 

273 

274 

275 

276 

277 

278 

279 

280 

281 

282 

283 

234 

285 

286 

287 

283 

289 

290 

291 

292 

233 

294 

295 

296 

297 

233 

299 

300 

301 

302 

303 



cCI 17-723 
CCI17-724 
cCI 17-726 
cCI 17-727 
cCI 17-728 
cCI 17-729 
cCI 17-730 
CCI17-732 
CCI17-733 
cCI 17-735 
cCI 17-736 
CCI17-737 
cCI 17-739 
CCI17-741 
CCII7-742 
cCI 17-743 
cCI 17-744 
CCI17-745 
cCI 17-801 
CCI17-802 
CCI17-808 
cCI 17-809 
CCI17-810 
cCI 17-812 
cC 1 1 7 - 8 1 3 
cCI 17-814 
CCI 17-815 
cCI 17-816 
CCU7-817 
cCI 17-813 
cCI 17- 

ccin- 

CCI17- 
cC I ! 7 - 
CCI17-825 
cCI 17-826 
cCt I7-S27 
cCI 17-328 
cCI 17-831 
cCI 17-332 
cCI 17-333 



-820 
-821 
-322 
-323 



D17S564 

D17S566 
D17S567 
D17S568 

D17S570 

D17S572 
D17S573 
D17S557 
D17S575 



D17S577 



I7pl3 
I7pll. 2 
17q25 
17pl3 
17pl2 
1 7q 1 1 . 2 
I7q21 .3 
I7pl3. 2 
17q25. 1 
17q25.3 
I7q21 . 3 
17q25 
17q25 

17q25 

17q25 

17q23 

17q23 

17pl3 

1 7q 1 1 

17pll . 2 

17q25 

17q23 

17pl3 

17q25 

17q23 

1 7 p I 1 

17q24 

17q23 

17q23 

1 7 p 1 1 

17ql2 

I 7 p 1 3 

17qll 

17ql2 

I 7 p I 1 

1 7 q 1 1 . 

17p(l . 

I7pl 1 . 

17q25 . 

1 7 p I 1 . 

i 7 q 23 



2-q25. 3 
1 
3 



ql2 
l-q25 . 



pL3. 1 



1 



1 -q25 . 2 



No. 



Probe 
naae 



Locu s 
syobo I 



Chroaosoaa t 
toca 1 i zat I on 



304 

305 

306 

307 

308 

309 

310 

311 

312 

313 

314 

315 

316 

317 

318 

319 

320 

321 

322 

323 

324 

325 

326 

327 

328 

329 

330 

331 

332 

333 

334 

335 

336 

337 

338 

339 

340 

341 

342 

343 



CCI17 
CCI17 
CCI17 
CCI17 
CCI17 
CCI17 
CCI17 
cCI 17 
CCI17 
CCI17 

ccin 

CCI17- 
CCI17- 
CCI17- 
cCI17- 
cCI 17- 
CCI17- 
CCI17- 
cCH7- 
CCI17- 
cCI 17- 
cCI 17- 
cCI 17- 
cCI 17- 
CCI17- 
cCI 17- 
CCI17- 
cCI 17- 
cCI 17- 
cCI 17- 
cCU7- 
cCI 17- 
cCI 17- 

ccr n- 

cCI 17- 
CCI17- 

ccrn- 

CCI17- 
cCI 17 - 
pCMM36 



-834 
-835 
-841 
-1005 
-1008 
-1016 
-1018 
-1019 
-1024 
-1029 
-1030 
-1031 
-1032 
-1049 
-1055 
-1059 
-1063 
-1073 
-1079 
-1082 
-1094 
-1101 
-1103 
1106 
1702 
1705 
1706 
1707 
1709 
17 10 
171 1 
1715 
1717 
17 19 
1720 
1722 
1723 
1724 
1725 



. 2-ql2 

.3 



1 7q 1 I . 
I7q21. 
17pl2 
I7q21 . 3 
17q2l .3 
I7q23. 1 
I7q21 . 2-21.3 
17q23. 1 
17ql2 
17qlK2 
I7q22 
17qll . 2 
17q23. 1-23. 2 
I 7q 2 1 .3 
17q2l .3 
17q2t . 1 -q 2 1 . 2 
I7ql2 
I7qll.2 
I7ql2 
I7q22 
17q21 . 1 
17ql2 
17qU . 2 
17qll. 2 
17q2l. I-q2l.2 
I7q21 . 2-q2l . 3 
I7q21 . 1 - q 2 I . 2 



I7q21 
17ql2 
I7q21 
17ql2 
I7ql2 
I7q2 1 
1 7 q I I 
I7q24 
I7q23 
I 7q 2 I . 3 
I7ql 1 . 2 
I 7 q 2 1 . I 
17q23 



2-q2t . 3 



3-q25. 1 



From among these markers, ones exhibiting restriction fragment length polymorphism (RFLP) in which 
40 the lengths of restriction fragments vary with the individual, namely RFLP markers, were selected. The 
selected marker clones, the restriction enzymes used, and the particular lengths of several fragments 
detected thereby are shown in Tables 4-6. 



45 



50 
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Table 4 



75 



20 



25 



30 



35 



40 



45 



50 



5 


No. 


Probe 
name 


Locus 
syebo 1 


Enzyme 


Allele size 
( frequency ) 






2 


CCI17-7 


D17S860 


PvuII 


3.0 kb(0.33) 






16 


CCI17-97 


D17S861 


PstI 


1 . 8 + 1 . 2 kb( 0. 
8.2 kb(0.92) 


67) 




17 


cCH7-3l5 


D17S521 


TaqI 


4.7+3.5 kb(0. 
2.0 kb(0.67) 


06) 


10 


18 


CCI17-316 


017S862 


Mspl 


1.8 kb(0.33> 
3.1 kb(0.33) 






19 


CCI 17-317 


D17S522 


Taql 2.6-3.9 kb 4 


2.7 kb(0.67) 
alleles VNTR.60% 





Chroaosoaa 1 
1 oca I t zat i on 



42 



CCI17-453 



cCI 17-469 



heterozygosity also polymorphic with 
Mspl, PstI. PvuII 
D17S525 Bgtll 5.8-7.5 kb 4 alleles VNTR.50% 

heterozygosity also polymorphic with 
EcoRI , Taq I , Ps 1 1 . Pvu 1 1 , Map I 
017S533 Mspl 2.0-2.6 kb 5 alleles VNTR.83% 
!!?t?r oiy 9 ?; i ? u sis? " ? ! 1 n ■ ~ 
EcoRI . Taq I , Pvu 1 1 



5 4 


c CI I 7 - 4 8 7 


D i 7 3 538 


Ecok I 




5,8 k b ( 0 . 7 5 i 












3.3 kb ( 0 . 25 ) 


5 6 


cC 1 1 7 - 489 


Dl 7SS40 


Mspl 




3.3 Kb ( 0 . 25 ) 












l . 1 kb ( 0 . 5 0 ) 








Taql 




1.5 kb(0. 50) 












1 1 t |/k f n en \ 








PvuII 




1.2 kb ( 0 . 50 ) 












n 7 irh/n <;m 


58 


cCI 1 7-49 I 


D17S863 


Taql 




J . 0 K D \ U . I J J 












J.J K D t U . » D J 


59 


cCI 1 7 -492 


Dl 7S542 


Bgl I I 




? i if h / n in i 


6 I 










I a ic h t n cm 


cCI 17-494 


01 7S865 


EcoRI 




10.3 kb(0. 92) 


70 










f.o Ko{0. 008) 


cC I I 7 - 505 


01 7S544 


Mspl 




3.1 Kb(0. 58) 












3.0 kb(0. 42 ) 








Taql 




4.1 kb ( 0. 67 ) 












2.7+ 1.4 kb(0. 


7 I 


CCM7-506 


0I7S545 


Mspl 




3.0 kb(0.33) 












2.6 kb( 0 . 67 » 


73 


CCII7-5U3 


Dl 7S546 


Mspl 




4.6 kb<0.50) 












4.0 kb(0. 50) 


3 0 


cCI 17-516 


D I 7S550 


Taql 




4.1 kb f □ . 25) 












2.4+1.7 kb( 0 . 








Pvu [ I 




3. 4 kb(0.63) 


33 










2.2 kb(0. 17 ) 


cCIU-525 


D17S866 


Mspl 




2.7 kb(0.42| 






D17S58^7 






2.3 kb( 0. 53 l 


1 13 


cCI 17-562 


Taql 




3.5 kb( 0 . 42 ) 












3.2 kb(0.53) 








Pvu I I 




7.1 kb( 0 . 92 ) 


137 










6.6 kb{ 0. 08 ) 


cCI 17-584 


D17S868 


Mspl 




3.3 kb( 0. 25 ) 


166 










3,6 kb(0.75) 


CCI 17-615 


D17S869 


PstI 




5.2 kb(0.42) 


243 










4.7 kb(0.58) 


cCin-701 


D17S870 


Taql 1. 


7-2.5 kb 6 


alleles VNTR, 67% 








he t erozygos i ty 


also po 1 yraor ph i c w i 








Mspl . 


Ps t I , Pvu I I 


,RsaI 


244 


CCII7-702 


017S871 


Mspl 




4. 1 kb(0.83) 












3.4 kb(0. 17 ) 








Rsal 




5.2 kb(0.83) 








Bgl I I 




4. 1 kb(0. 17 ) 










6.6 kb(0.33) 








Pvu I I 




5.6 kb(0. 17) 










2.9 kb{ 0. 83 ) 












2. 2 kb(0. 17) 



17q2l.3 
I7q25. 1 -q 2 5 . 2 
17ql2-q2l . 1 
17qll.: 

I7pl3 

I7q25.2-q25.3 

17q25. 1 
17q23 



I 7 p 1 3 . 1 
1 7q I 1 . 2 

1 7 p 1 2 - p 1 I. I 

I 7q 2 I 

1 7q 25 . I -q 2 5 . 
I7q25. 1 



17q2l.3 
17q25.2-q25.3 
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No. 



Probe 
naae 



Locus 
symbol 



Enzyoe 



Allele size 
( f requency ) 



Chrooosona I 
1 oca I i za t i on 



10 



245 CCI17-703 D17S877 

247 CCI17-705 D17S554 

250 CCI17-708 D17S878 

252 CCI17-710 D17S557 

254 CCI17-712 D17S558 



/5 



255 CCI17-713 



20 



25 6 



cCH 7 -7 1 4 



D17S559 



D17S5G0 



25 



30 



257 cC I i 7 - 7 1 5 D17S872 

253 cC ! I 7 - 7 1 6 D17S561 
261 cCI 17-721 D17S364 



26 2 cCIll 



D I 7S563 



35 



40 



263 CCI17-723 D17S873 



45 



50 



266 CCI17-727 D17S566 

268 CCI17-729 D17S568 



TaqI 2.6-3.8 kb 4 alleles VNTR.50X 

heterozygosity also polymorphic with 

Mspl. Rsal .PstI . Pvul I 
PstI 4.3 kb(0.50) 

2.3+ 2.0 kb(0.50) 
Pvull 2.6-9.0 kb 10 alleles VNTR.87X 

heterozygosity also polymorphic with 

Mspl.Taql.Bglll. PstI. EcoRI 
Mspl 2.0-2.6 kb 5 alleles YNTR.100X 

heterozygosity also polynorphic with 

Rsal.Taql.PstI .Pvull. EcoRI 
Mspl 3.1 kb(0. 58 ) 

2.9 kb(0. 42 ) 
TaqI 6.6 kb(0. 67 ) 

4.3+ 2.3 kb(0.33J 
Pvul I 7 . t kb(0. 50) 

3.9+3.2 kb(0. 50) 
Mspl 2.2-2.8 kb 3 alleles VNTR.50X 

heterozygosity also polymorphic with 

PstI 



Rsal 
TaqI 
Bgt I I 
Pvul 1 



Ps t I 

EcoRI 

TaqI 

Rsal 

3gl I [ 

Mspl 

Rsal 

Bgl I I 

Pvu I I 

EcoRI 

Mspl 

Rsal 

TaqI 

PstI 

Pvull 



4.5 kb(0.58) 
4.3 kb(0.42) 
3.8 kb(0.75) 

2.3 kb(0. 25) 
3.8 kb(0.53) 

3.5 kb(0.42j 

2.6 kb{0.58) 

2.4 kbt0.42) 

5 kb(0 . 58 ) 
4 kb( 0 . 42 J 
3 kb(0. 17 ) 
0 kb(0.B3) 

6 kb<0.87) 

3.3 kb(0. 13) 

2.4 kb(0. 37 ) 
1.3+1.1 kb( 0 . 1 3 J 



2.9 k b »' 0 . 2 5 ) 
1.6 kb(0.75) 
4.4 kb(0 . 83 > 
3.9 kb(0. 17 ) 

4. 1 kb(0.83) 
3.4 kb(0. 17 ) 

5. 2 kb(0.33) 

4.1 kb<0. 17 ) 
6.6 kb(0. 33 ) 
5. 6f)kb(0. 17 ) 
2.9* kb(0.83) 

2.2 kb(0. 17 ) 
13.0 kb<0.75) 
12.5 kb(0. 25) 
3.0 kb(0. 33 ) 

7 kb(0. 67 ) 

8 kb(0.70) 

5 kb(0.30) 

6 kb(0. 33 ) 

9 kb(0. 67 ) 
8 kbfO.50) 
3 kb(0. 50 ) 
6 kb(0. 58 ) 

kb(0. 42 ) 



Pvull 2.6-9.0 kb 10 alleles VNTR.87% 
heterozygosity also polyoorphic with 
Mspl.Taql.Bglll.PstI.EcoRI 

Mspl 4.6 kb(0.58) 

2.6 kb{0. 42 ) 



17pl3 

17pll.2 
17pl3 

I7q25.3 

1 7 p 1 1 . 2 

17pl3 
17q25.3 



I 7q 2 I . 3 

1 7 p 1 3 

1 7 q 2 2 - q 2 3 

1 7 q 25 . 2-q25. 3 



I 7 p 1 3 



1 7 p 1 3 
17qll.2 
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Table 6 



Probe Locus Allele size Chroaosoaal 

No. naoe syabol Enzyae (frequency) localization 

269 CCI17-73.0 D17S874 Mspl 2.2-3.5 kb 4 alleles VNTR.83% 1 7q2 1 . 3 

heterozygosity also polymorphic with 
TaqI, Bglll, PstI, PvuII 

17pl3.2 



270 


CCI 17-732 


D17S570 


Rsal 


3 


r 2 


kb(O.SO) 










2 


. 7 


kb(O.SO) 








Bgl II 


8 


.5 


kb(0. 50) 










3 


. 2 


kb(O.SO) 








PstI 


2 


.5 


kb<0.58) 










1 , 


. 7 


kb(0.42) 








Pvul I 


4. 


( 2 


kb(0. 50) 










4. 


1 


kb(0.50) 


27 I 


CCI 17-733 


D17S875 


Mspl 


3. 


4 


kb(0.75) 










2 . 


6 


kb{0. 25) 


27 2 


cCI 17-735 


D17S572 


Mspl 


4 . 


1 


kb<0.83) 










3. 


4 


kb(0. 17 ) 








Rsal 


5. 


2 


kb(0.83) 










A 


i 


kb(-0. 17 i 








PvuII 


2 # 


9 


kb(0.83) 










2 . 


2 


kb(0. 17 ) 


273 


cCI 17-736 


017S573 


TaqI I . 


.7-2.5 kb 7 alleles 


VNTR, 100! 



17q25. I 
I7q25.3 



17q2l.3 



heterozygosity also polymorphic with 
Mspl, Rsal, PstI, PvuII 

275 CCI17-739 DI7S575 Mspl 3.3 kb(0.33) 17q25.1 

2.4 kb(0.67) 

278 CCI 17 -743 017S876 TaqI 4.3 kb ( 0 - 17) 

2.8 kb(0.83) 



RFLP markers are characterized in that they can be used to distinguish between two alleles inherited 
from parents by the difference in polymorphism ("informative") [however, they are indistinguishable when 
both of them have the same polymorphic pattern ("not informative")]. If such a difference in polymorphic 
pattern between two alleles ("heterozygosity") exists in normal tissues and the loss of heterozygosity (LOH) 
is detected in tumor tissues, this implies the allelic deletion in the RFLP marker site on a specific 
chromosome of tumor tissues. It is generally believed that the inactivation of tumor suppressor genes on 
both alleles, as caused by the deletion of one allele and the mutation in the other, may lead to malignant 
transformation. Thus, it is assumed that a tumor suppressor gene is present in a region commonly deleted 
in many cancers. 

Using the detailed chromosome map and RFLP markers thus obtained, the present inventors examined 
about 300 breast cancers and abbut 100 ovarian cancers for LOH in chromosome 17. As a result, it was 
revealed that, in informative cases, a region (of 2.4 cM) lying between cosmid markers cCI1 7-701 and 
cCM 7-730 located in the neighborhood of 17q21 was deleted with high frequency. 

Fig. 2 shows partial deletions on chromosome 17q in ovarian cancers. Solid circles represent the loss of 
heterozygosity (LOH) and open circles represent the retention of both alleles. Two commonly deleted 
regions are designated by sidelines. 

Fig. 3 shows partial deletions on chromosome 17q in breast cancers. Solid circles represent the loss of 
heterozygosity (LOH) and open circles represent the retention of both alleles. Two commonly deleted 
regions are designated by sidelines. 

One of the commonly deleted region partially overlapped with the region in which the presence of a 
causative gene was suggested by linkage analyses of families affected with hereditary breast cancer. When 
650 cases of sporadic breast cancer were examined for somatic rearrangements by Southern-blot analysis 
using cosmids located to the overlapping region as probes, it was revealed that a partial region in the DNA 
of cosmid clone cCM 7-904, which had been selected as described above, detected amplification. On closer 
examination of this alterations, it was found that segments each having about 6-9 kb were connected with 
each other to form an abnormal repetition consisting of about 4-6 copies. Moreover, a gene encoding a 
novel protein was isolated by screening cDNA (DNA having a complementary base sequence reverse- 
transcribed from messenger RNA) libraries by using, as probe, a restriction fragment of this cosmid clone 
having a sequence which was conserved among other species. When the sequence structure of this gene 
was determined and the presence or absence of genomic alterations of this gene in breast cancers was 
examined, a distinct gene mutation was identified. These results have revealed that deficiency or mutation 
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in this protein and the allelic deletion or mutation of the DNA encoding it deeply participate in the onset of 
breast and ovarian cancers. 

Fig. 4 shows the above-described process starting with a group of markers and leading to the isolation 
of the gene, as well as the regions where genomic rearrangements occurred in tumor tissues. Clone names 

5 are designated by clone numbers alone. 

The present invention is very important in that it can provide methods and materials for solving difficult 
problems (such as risk diagnosis, early finding, course watching, determination of a treatment plan, and 
estimation of prognosis) concerning at least a part of breast and ovarian cancers, for example, by examining 
the presence or absence of deficiency or mutation in the protein of the present invention or the presence or 

70 absence of the allelic deletion or mutation of the gene encoding it, and thereby bring about a marked 
advance in the technology in this field. 

Specifically, the present invention provides (1) an MDC protein which comprises the whole or part of the 
protein represented by SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4, or which consists of a 
protein substantially equivalent to one comprising the whole or part of the protein represented by SEQ ID 

75 NO:1, SEQ ID NO:2 t SEQ ID NO:3 or SEQ ID NO:4, (2) a DNA which comprises the whole or part of the 
DNA represented by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8 or SEQ ID NO:9, or which 
consists of a DNA substantially equivalent to one comprising the whole or part of the DNA represented by 
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8 or SEQ ID NO:9, (3) a plasmid containing the 
DNA as set forth in the above (2), a transformant carrying the plasmid, i.e., a transformant transformed with 

20 the plasmid, and a process for the production of the MDC protein described above, which comprises the 
steps of culturing the transformant described above and collecting the resulting expression product, (4) an 
antibody which can bind to the MDC protein described above as an antigen, and (5) a primer, probe or 
marker which has a DNA sequence comprising a part of the DNA sequence of the DNA as set forth in the 
above (2), or a DNA sequence complementary to a part of the DNA sequence of the DNA as set forth in the 

25 above (2), and a gene analysis method which comprises the step of hybridizing the primer or probe 
described above to a DNA to be tested. 

The term "MDC protein" in this specification means a protein and a peptide (including a oligopeptide 
and a polypeptide) involved in the definition of the term, "the MDC protein". 

Further scope and applicability of the present invention will become apparent from the detailed 

30 description given hereinafter. However, it should be understood that the detailed description and specific 
examples, while indicating preferred embodiments of the invention, are given by way of illustration only, 
since various changes and modifications within the spirit and scope of the invention will become apparent to 
those skilled in the art from this detailed description. The present invention will be specifically described 
hereinbelow. 

35 

Detailed Description of the Invention 

(1) Isolation of cDNA clones 

Cosmid clones having a DNA derived from human chromosome 17 introduced thereinto can be 
produced, for example, by extracting chromosomal DNA from a human-mouse hybrid cell line containing a 
single human chromosome 17 in a mouse genomic background, and incorporating fragments of the 
chromosomal DNA into a vector such as pWEX15, according to a method reported by Tokino et al. (Tokino 
et al., Am. J. Hum. Genet., 48, 258-268, 1991). From among them, clones having an insert derived from the 
human chromosome can be selected by colony hybridization using the whole human DNA as probe. 

The map position of each of the cosmid clones can be determined by FISH. Then, they can be used as 
markers to construct a high resolution physical chromosome map. Moreover, RFLP markers can be 
selected on the basis of the fragment length pattern in southern blot analysis (Nakamura et al., Am. J. Hum. 
Genet., 43, 854-859, 1988). If this map and these RFLP markers are utilized to examine DNAs obtained 
from the tumor tissues of cancer patients for LOH (loss of heterozygosity), the commonly deleted region on 
the chromosome in the tumor tissues can be localized to a very small region near q21 of chromosome 17. 

Southern-blot analysis of the DNAs from tumor tissues by using a cosmid clone, whose hybridizable 
portion is present in this localized region, as probe makes it possible to select clones having a DNA 
sequence associated with genomic alterations in the tumor tissues. Moreover, Southern-blot analysis of the 
chromosomal DNAs of various mammals by using restriction fragments of the cosmid clone as probes 
makes it possible to select a fragment containing a DNA sequence conserved among other species and 
involved in fundamental cellular functions. DNA sequences encoding important proteins are often conserved 
among other species. In fact, many of the hitherto isolated genes for hereditary diseases are conserved 
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among other species (Call, K.M. et al., Cell, 60, 509-520, 1990). 

If the DNA fragment thus obtained is used as probe, the cDNA of the gene present in a localized region 
near q2l of human chromosome 17 can be cloned. The base sequence of this cDNA can be determined by 
a conventional manner (Maniatis, J. et al., Molecular Cloning 2nd. ed., Cold Spring Harbor Laboratory Press, 
5 N.Y. 1989). 

In order to confirm that the DNA clones thus obtained are clones of the desired causative gene, their 
sequences may be used to examine the presence or absence of genomic alterations in cancer patients and 
the incidence of genomic alterations according to the SSCP method (Orita, M. et al., Genomics, 5, 874-879, 
1984; Orita, M. et al., Cell, 60, 509-520, 1990), the RNase protection method (Winter, E., Perucho, M. et al., 
io Proc. Natl. Acad. ScL USA, 82, 7575-7579, 1985; Myers, R.M. et al., Science, 230, 1242-1246, 1985) and 
other methods. 

(2) Confirmation of the whole structure of the gene 

is It has been confirmed that the DNA sequences of two cDNAs obtained by the above-described 
procedure are novel and are those of the DNAs represented by SEQ ID NO:6 and SEQ ID NO:7. The 
corresponding amino acid sequences have also been iclentificu as those of the proteins represented by 
SEQ ID NG:2 and SEQ ID N0:3. Moreover, 5' -RAGE and RT-PGR have revealed the DNA sequence of the 
DNA represented by SEQ ID NO:8, and the amino acid sequence of the protein represented by SEQ ID 

20 NO:4 has been deduced as one corresponding to the DNA sequence. Furthermore, with regard to genomic 
DNA, the structure of the DNA represented by SEQ ID NO:9 including introns and exons has been revealed 
by analyzing the base sequence of the original cosmid clone cCI1 7-904 and comparing it with the base 
sequence of the isolated cDNA clone to determine the intron-exon junctions. 

By the present' inventors, proteins comprising the whole or part of the amino acid sequence of the 

25 protein represented by SEQ ID NO:1, which is an amino acid sequence common to all of the above- 
described proteins, are named MDC proteins and will hereinafter be referred to as MDC proteins. 

The term "a part of the protein" means, for example, a polypeptide having or comprising an amino acid 
sequence consisting of a continuous, at least three amino acids which is described in SEQ ID NO:1. The 
amino acid sequence consists of preferably at least three to five amino acids, still more preferably at least 

30 eight or at least eight to ten amino acids, and most preferably at least eleven to twenty amino acids. It is to 
be understood that polypeptides each having or comprising an amino acid sequence consisting of a 
continuous, more than 20 amino acids which is described in SEQ ID NO:1 can also be used. 

As used herein, the term "substantially equivalent" means that, in proteins comprising the whole or part 
of the amino acid sequence of the protein represented by, for example. SEQ ID NO:1, their amino acid 

35 sequences are attended with the replacement, deletion and/or insertion of one or more amino acids, but 
they can produce an equal effect in research and diagnosis using the proteins comprising the whole or part 
of the amino acid sequence of the protein represented by, for example, SEQ ID NO:1. Such equivalents 
also fall within the scope of the present invention and also called as MDC proteins. 

The DNA sequence common to all DNAs encoding MDC proteins is one of the DNA represented by 

40 SEQ ID NO:5. 

A DNA in accordance with the present invention can be utilized in gene analysis and diagnosis. That is, 
a primer or probe comprising a part of the DNA sequence of the DNA according to the present invention, or 
comprising a DNA sequence complementary to a part of the DNA sequence of the DNA according to the 
present invention is used in gene analysis and diagnosis. 

45 Part of the DNA sequence consists of at least six bases, preferably at least 8 bases, still more 
preferably 10-12 bases and particularly preferably about 15-25 bases. That is, the oligonucleotide used as 
primer or probe comprises at least six bases derived from the DNA sequence of the DNA according to the 
present invention or derived from the DNA sequence complementary to the DNA sequence of the DNA 
according to the present invention, and, if necessary, other base(s). 

so In connection with the DNAs of the present invention, the term "substantially equivalent" has the same 
meaning as described above for the proteins, except that their base sequences are attended with the 
replacement, deletion and/or insertion of one or more bases. 

The introduction of replacement, deletion and insertion mutations into a particular base sequence can 
be accomplished according to any of conventional methods including those described in F.M. Ausubel et 

55 al., "Current Protocols in Molecular Biology", 1987, Chapter 8. 

The MDC protein encoded by the DNA according to the present invention, i.e., the MDC protein 
according to the present invention, can be utilized by using it as an epitope to prepare an antibody. This 
antibody , can be used in experimental and diagnostic reagents. The term "epitope" means an antigenic 
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determinant of a polypeptide and is generally composed of at least 5 amino acids. It is well known that a 
polypeptide composed of 6 amino acids binds with an antibody, as disclosed in, for example, Published 
Japanese Translation of International Patent Application No. 60-500684. 

5 (3) Recombinant expression vectors and transformants generated therewith 

A transformant can be obtained by incorporating a DNA encoding human MDC protein, which has been 
obtained by the above-described procedure, or a fragment thereof into a suitable vector and introducing this 
vector into suitable host cells. By culturing this transformant with a conventional manner, large amounts of 

io human MDC protein can be obtained from the culture. More specifically, a recombinant expression vector 
can be produced by linking a DNA encoding a human MDC protein or a fragment thereof on the 
downstream side of the promoter of a vector suited for its expression according to a well-known method 
using restriction enzymes and DNA ligase. Usable vectors include, for example, plasmids pRB322 and 
PUC18 derived from Escherichia coli , plasmid pUB110 derived from Bacillus subtilis , plasmid pRBl5 

75 derived from yeast, phage vectors Kgt10 and \gt11, and vector SV40 derived from an animal virus. 
However, no particular limitation is placed on the type of vector used, so long as it can be replicated and 
amplified in the host. Similarly, no particular limitation is placed on the promoter and terminator, so long as 
they are compatible with the host used for the expression of the DNA base sequence encoding the human 
MDC protein. They may be used in any suitable combination depending on the host. The DNA used can be 

20 any of DNAs encoding human MDC protein. It is not limited to the base sequences represented by SEQ ID 
NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8 and SEQ ID NO:9, but can be any of DNAs in which a 
part of the base sequence has undergone replacement, deletion, insertion or a combination thereof, whether 
intentionally or not. In addition, chemically synthesized DNAs can also be used. 

A transformant is generated by introducing the recombinant expression vector thus obtained into a host 

25 according to the competent cell method (J. Mol. Biol., 53, 154, 1970), the protoplast method (Proc. Natl. 
Acad. Sci. USA, 75, 1929, 1978), the calcium phosphate method (Science, 221, 551, 1983), the in vitro 
packaging method (Proc. Natl. Acad. Sci. USA, 72, 581, 1975) or the virus vector method (Cell, 37, 1053, 
1984). The host used can be Escherichia coli , Bacillus subtilis , yeast or animal cells, and the resulting 
transformant is grown in a suitable medium depending on the host. Usually, the transformant is grown at a 

30 temperature of 20 to 45 °C and a pH of 5 to 8, optionally with aeration and stirring. Separation and 
purification of the MDC protein from the culture may be carried out using a suitable combination of well- 
known separation and purification techniques. These well-known techniques include salting-out, solvent 
precipitation, dialysis, gel filtration, electrophoresis, ion exchange chromatography, affinity chromatography, 
reverse-phase high-performance liquid chromatography and the like. 

35 

(4) Preparation of antibodies 

Antibodies can be prepared in the usual manner by using an^antigen of which epitope part comprises 

an MDC protein. For example, a polyclonal antibody can be prepared by fully immunizing an animal such 
40 as mouse, guinea pig and rabbit through a plurality of subcutaneous, intramuscular, intraperitoneal or 

intravenous injections of the antigen described above, collecting blood from this animal, and separating 

serum therefrom. Commercially available adjuvants may also be used. 

A monoclonal antibody can be prepared, for example, by immunizing a mouse with the antigen 

described above, fusing its spleen cells with commercially available mouse myeloma cells to produce a 
45 hybridoma, and collecting an antibody from the culture supernatant of the hybridoma or the ascites of a 

mouse inoculated with the hybridoma. 

The MDC protein which is used as antigen or is used to prepare an antigen need not necessarily have 

the whole amino acid structure described in SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4, 

but may have a partial structure of the amino acid sequence described in SEQ ID NO:1, SEQ ID NO:2, SEQ 
so ID NO:3 or SEQ ID NO:4. Alternatively, the MDC protein may be a variant or derivative of the MDC protein 

represented by SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4. The antigen may be an MDC 

protein as such, or a fusion peptide consisting of an MDC protein (induing peptide) and another peptide. 

Preparation of the fusion peptide may be carried out according to either biological techniques or chemical 

synthesis techniques. 

55 These antibodies enable identification and determination of the MDC protein present in human 
biological specimens and can hence be used as reagents for the diagnosis of cancer, and the like. 

The immunological determination of the MDC protein can be made according to any conventional 
technique. For example, any of the fluorescent antibody technique, the passive agglutination technique and 
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the enzyme antibody technique may be employed. 

(5) Gene analysis of human tumor tissues 

5 The biological specimens which can be used for gene analysis include human normal tissues and 

various types of human tumor tissues, as well as human blood, human body fluids, human secretions and 
the like. The extraction and preparation of DNA can be carried out, for example, according to the method of 
Sato et al. (Sato, T. et al., Cancer Res.. 50, 7184, 1990). 

The presence or absence of mutations of the gene can be analyzed by using, as probes, a restriction 
/o fragment of the DNA encoding human MDC protein as provided by the present invention, or by selecting a 
properly located base sequence from the DNA. synthesizing an oligonucleotide having the selected base 
sequence and using the oligonucleotide as primer. 

These analyses can also detect other alterations, such as insertion and deletion, of the gene in 
samples. 

75 The base sequences selected for this purpose can be exon portions, intron portions, or junction portions 
therebetween. It is a matter of course that artificially modified base sequences may be used. When an 
artificially modified base sequence is used to prepare primer, the corresponding gene mutation can be 
detected by the gene analysis. 

Analyses can be carried out, for example, by amplifying a partial sequence by PCR using two selected 

20 sequences as primers and analyzing the base sequence of the amplification product directly, or by 
incorporating the amplification product into a plasmid in the same manner as that described above, 
transforming host cells with this plasmid, culturing the transformed cells, and analyzing the base sequence 
of the clone thus obtained. Alternatively, the presence or absence of particular mutations of the gene in 
samples can be directly detected by the use of the ligase chain reaction method (Wu et al., Genomics, 4, 

25 560-569, 1989) and, moreover, the mutant sequence specific PCR method (Ruano and Kidd, Nucleic Acid 
Research, 17, 8392, 1989; C.R. Newton et al., Nucleic Acid Research, 17, 2503-2517, 1989). 

Similarly, using probes containing DNA sequences selected or RNA sequences derived therefrom, point 
mutations can be detected by the SSCP method or the RNase protection method. Moreover, use of these 
probes also makes it possible to detect mutations of the gene in samples by Southern hybridization and 

30 abnormalities in the expression level of the gene in samples by northern hybridization. 

Escherichia coli DH5/pBR1 and Escherichia coli XLI-Blue MRF'Kan/pCR-5P2, each carrying a plasmid 
containing the DNA encoding this MDC protein, and Escherichia coli 490A/cCI 17-904, carrying a cosmid 
containing the genomic DNA, were deposited with the National Institute of Bioscience and Human- 
Technology, Agency of Industrial Science and Technology, Ministry of International Trade and Industry on 

35 April 28, 1993. February 8, 1994 and April 28, 1993 under accession numbers FERM BP-4286. FERM BP- 
4555 and FERM BP-4287, respectively. 

The MDC proteins and DNAs encoding the MDC proteins according to the present invention are 
expected to be useful as reagent^ for cancer research, testing and diagnostic reagents, and therapeutic 
agents. 

40 

Examples 

The present invention will now be described in more detail with reference to the following Examples 
which should not be considered to limit the scope of the present invention. 

45 

Example 1 Isolation of cosmid clones specific for human chromosome 17 and construction of a chro- 
mosome map 

A human-mouse hybrid cell line (GM10331) containing a single human chromosome 17 in a mouse 
so genomic background was selected from among hybrid cells produced by fusing human normal cells with 
cells of an established mouse cell line and cosmid clones specific for human chromosome 17 were isolated 
according to the method of Tokino et al. (Tokino et al.. Am. J. Hum. Genet., 48, 258-268, 1991). The 
chromosomal DNA of this hybrid cell line was properly digested with restriction enzyme Sau 3AI and the 
ends of the fragments thus obtained were treated by partial filling-in with dATP and dGTP. Fragments 
55 having a size of 35-42 kb were separated therefrom and inserted in cosmid vector pWEXlS which had 
previously been digested with restriction enzyme Xho I and similarly treated at its ends by partial filling-in 
with dCTP and dTTP. From among the resulting cosmid clones, clones containing human DNA fragments 
were selected by colony hybridization using 32 P-labeled human chromosomal DNA as probe. Thus, 342 
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cosmid clones specific for human chromosome 17 were isolated. 

With regard to each of these cosmid clones specific for human chromosome 17, the location to which 
its cosmid DNA hybridize on the chromosome was determined by FISH (Ina2awa et al., Genomics, 10, 
1075-1078, 1991). Thus, a physical chromosome map for chromosome 17 was constructed (see Tables 1-3 
5 and Fig. 1). 

Using DNAs obtained from 6 unrelated individuals, the cosmid clones (cosmid markers), the locations 
on the chromosome to which their cosmid DNA hybridize had been determined, were examined by a known 
method (Nakamura et al., Am. J. Hum. Genet., 43, 854-859, 1988) in order to see whether RFLP could be 
detected or not. The restriction enzyme used was Msp I, Taq I, Bgl II, Pst I, Pvu II, Rsa I or Eco Rl. As a 
70 result, RFLP was detected in 43 clones (see Tables 4-6). That is, these 43 clones were usable as RFLP 
markers. 

Example 2 Detection of commonly deleted regions of the human chromosome 17q in ovarian and breast 
cancers 

75 

Tumor tissues were obtained from 94 patients with ovarian cancer and 246 patients with breast cancer 
who underwent surgery. Corresponding normal tissues or peripheral blood samples were also obtained from 
the respective patients. DNAs were extracted from these tissues or samples according to a known method 
(Sato et al., Cancer Res., 50, 7184-7189, 1990). Each DNA was digested with suitable restriction enzymes, 
20 and the fragments thus obtained were subjected to 1.0% agarose gel electrophoresis and then Southern 
transferred to a nylon membrane with 0.1 N NaOH/0.1M NaCI (Sato et al., Cancer Res., 50, 7184-7189, 
1990), 

The membranes thus obtained were examined for LOH (loss of heterozygosity) by Southern hybridiza- 
tion (Sato et al., Cancer Res., 50, 7184-7189, 1990) using, as probes, the RFLP markers obtained by the 
25 procedure of Example 1 (see Table 7). 
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A total of 84 among 94 ovarian tumors were informative for at least one locus, and 33 (39.3%) of them 
55 showed LOH for at least one locus on chromosome I7q. Among 246 breast tumors examined, 214 were 
informative for at least one locus, and 88 (41.4%) showed LOH for at least one locus on chromosome 17q. 

From the above results, the instances which were informative for two or more loci and exhibited both 
loss of heterozygosity at a locus and retaining of heterozygosity at other locus on chromosome 17q were 
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summarized. 

As a result two commonly deleted regions were found in 8 ovarian cancers (see Fig. 2). One of them 
was a region lying between markers CI17-316 (17q12-2l.1) and CI17-507 (17q2l.3), and the other was a 
region distal to the marker CI17-516 (17q25.1). 
5 Similarly, two commonly deleted regions were found in 35 breast cancers (see Fig. 3). One of them was 

a region lying between markers CI17-701 (17q21.3) and CI17-730 (17q21.3), which was also found in the 
ovarian cancers but was more narrowly localized. The other was a region lying on the terminal side of 
marker CI1 7-516 (17q25.1), which was also the region where a deletion was observed in the ovarian 
cancers. 

w Of the two commonly deleted regions defined by the above-described deletion mapping, the region 
flanked by markers CM 7-701 and CM 7-730 was found to lie close to the 17q21 region showing an intimate 
correlation with the onset of cancer on the basis of the results of linkage mapping studies on hereditary 
breast cancer and ovarian cancer (Hall et al.. Am. J. Hum. Genet., 50, 1235-1242, 1992). The length of this 
region (i.e., the genetic distance between the two markers) was estimated to be 2.4 cM by linkage analysis 

is (Lathrop et al., Am. J. Hum. Genet., 37, 482-498, 1985; Donis-Keller et al., Cell, 51, 319-337, 1987). 

Example 3 Isolation of cosmid clones contained in the minimal localized region 

Since it has been shown that the region localized on the basts of the results of linkage mapping is a 
region lying between markers THRA1 and Mfd188 on I7q2l (Hall et al., Am. J. Hum. Genet., 52, 1235- 
1242, 1992; Bowcock, A.M. et at., Am. J. Hum. Genet., 52, 718-22, 1993), an attempt was made to 
determine the relative order of these markers and markers CM 7-701 and CM 7-730 and thereby combine the 
mapping information obtained by two different strategies. The relative order of the markers was determined 
by a two-color FISH method newly developed by the present inventors. This method is a modification of 
FISH in which a highly extended chromosome preparation obtained by synchronization of the cells is used 
to enhance the degree of fineness and, moreover, probes labeled with fluorescent materials having different 
colors are used. This method makes it possible to determine the relative order of markers very close to 
each other. 

As a result, it was found that marker Mfd188 lies between markers CM 7-701 and C1 17-730 and marker 
THRA1 lies on the centromeric side of CM 7-701 (see Fig. 4, a). That is, the region associated with 
hereditary breast cancer as localized by linkage mapping and the commonly deleted region in sporadic 
breast cancers as localized by deletion mapping overlapped each other and the overlapping minimal region 
was flanked by markers CI17-701 and Mfd188 (see Fig. 4. a). When a physical map of this region was 
constructed by pulsed-field gel electrophoresis, the length of the overlapping region was greatly narrowed 
down to about 500 kb. 

Furthermore, of the cosmid clones obtained by the procedure of Example 1 , 37 clones localized to 
17q21.3 and three known markers, THRA1, Mfd188 and PPY, were selected and used for fine mapping of 
this chromosomal region by two-color FISH. As a result, 15 cosmid clones were located in a region flanked 
by markers CM 7-701 and CM 7-730. Of these, two cosmid clones, C1 17-527 and CM 7-904, were found to lie 
in the above-described overlapping region (see Fig. 4, a and b). 

Example 4 Detection of genomic alterations in breast cancers 

Of the overlapping region of about 500 kb, about 150 kb has already been covered by four cosmid 
45 clones CM 7-701, CM 7-527, CM 7-904 and Mfd188. Accordingly, an attempt was first made to screen 
restriction (Sac I, Pvu II or Pst I) fragments of the DNAs from the tumor tissues of 650 sporadic breast 
cancers by Southern-blot analysis using the DNAs of these cosmid clones or fragments thereof as probes 
and thereby detect gross structural genomic alterations (so-called genomic rearrangements), such as 
deletion, duplication, amplification and translocation, having occurred in the tumor cells. As a result, when 
so the DNA of C1 17-904 or its 9.5 kb Hind III fragment (see Fig. 4, c) was used as probe, genomic 
rearrangements were detected in the tumor tissues of two breast cancers (see Fig. 5, a and b). These 
genomic rearrangements occurred only in the tumor tissues, exhibiting extra bands of different size which 
were not observed in normal tissues. In addition, the intensities of some bands were increased. That is, a 
gene amplification occurred in a definite DNA region corresponding to (i.e., hybridizable) this probe. In one 
55 case among the above-mentioned two breast cancers, no gene amplification was detected when Southern- 
blot analysis of the Sac I fragments of DNA from the breast cancer tissue was carried out by using the E- 
H5.2 or Hind6.l fragment adjacent to the 9.5 kb Hind III fragment (see Fig. 4, c) as probe (see Fig. 6, Case 
1). This indicates that the gene amplification in this case occurred within the region corresponding to the 9.5 
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kb Hind III fragment and was a 4- to 5-fold amplification. 

For purpose of closer examination, Southern-blot analyses of the Sac I fragments of DNA from the 
breast cancer tissue were carried out using each of six Sac I fragments derived from the 9.5 kb Hind III 
fragment, A, B, C, D, E and F (see Fig. 4, c). as probe. As a result, amplified bands of abnormal size were 
5 observed at 2.5 kb with probes A and B, at 3.0 kb with probes B, C and D, at 2.5 kb with probes E and F, 
and at 0.9 kb with probe F (see Fig. 7). 

In the other case, gene amplification was detected when Southern-blot analysis of the Sac I fragments 
of DNA from the breast cancer tissue was carried out by using the E-H5.2 fragment as probe (see Fig. 6, 
Case 2). However, no gene amplification was detected when the Hind6.l fragment was used as probe (see 
w Fig. 6, Case 2): In this case, when the E-H5.2 fragment was used as probe, only an amplification was 
observed without being attended with any band of abnormal size. This indicates that the gene amplification 
in this case occurred in a segment extending from within the region corresponding to the 9.5 kb Hindlll 
fragment to the outer (telomeric) side of the region corresponding to the E-H5.2 fragment. 

is Example 5 Isolation of cDNA and determination of its structure 

In order to isolate an expressed gene in or near the region where genomic rearrangements were 
detected in the two breast cancers, DNA fragments containing- DNA sequences involved in fundamental 
cellular functions and conserved among other species were selected from DNA fragments of cosmid clone 

20 CM 7-904. Specifically, each of the DNA fragments of cosmid clone C1 17-904 was used as probe in 
Southern blot hybridization analyses of DNA fragments from cow, pig, mouse, rat and chicken. As a result, 
the 3.5 kb Hind lll-Ksp I fragment (see Fig. 4, c) of cosmid clone C1 1 7-904 hybridized to DNAs from cow, 
pig, mouse and rat and showed strong conservation. 

Using this 3.5 kb Hind lll-Ksp I fragment as probe, human cDNA libraries derived from five different 

25 organs (i.e., mammary gland, breast cancer cell line, fetal brain, cerebrum and cerebellum) were screened. 
Thus, the longest cDNA was cloned from the cerebellar cDNA library. This cDNA hybridized to the 3.5 kb 
Hind lll-Ksp I fragment of cosmid clone CI1 7-904 and a plurality of adjoining restriction fragments, and 
extended over a region of more than 20 kb on the chromosome. 

Analysis of the base sequence of this cDNA revealed that it consisted of 2923 base pairs (bp) and was 

30 a novel DNA base sequence containing a S'-untranslated region of 27 bp, a coding region of 1575 bp, a 3'- 
untranslated region of 1306 bp, and a poly(A) tail of 15 bp (see SEQ ID NO:6). The open reading frame 
contained in this cDNA sequence encoded a novel protein (MDC protein; see SEQ ID NO:2). An in-frame 
termination codon was present immediately upstream of the first ATG of the open reading frame. A 
polyadenylation signal. AATAAA, was observed about 20 bp upstream from the polyadenyiation site. 

35 

Example 6 Determination of the structure of genomic DNA 

In order to clarify the structure'/ of the genomic DNA corresponding to the cDNA obtained in Example 5, 
cosmid clone CM 7-904 was examined to determine the base sequences of portions containing the base 
40 sequence of this cDNA and portions surrounding them. Then, the sequences of both were compared to 
determine the exon-intron junctions. As a result, the sequence structure of a novel DNA containing 25 exons 
corresponding to the cDNA obtained in Example 5 was clarified (see SEQ ID NO:9). Thus, it was shown that 
these 25 exons are of relatively small size and present over an about 20 kb region of the chromosome. 

45 Example 7 Detection of alterations in the exon structure of the gene in breast cancers 

From the structure of the DNA containing exons/introns as clarified in Example 6, it has become 
apparent that exons 2, 3 and 4 are present in the sequence region of the probe (the 9.5 kb Hind III fragment 
of cosmid clone CM 7-904) with which alterations were detected in the tumor tissues of two breast cancers 

so as described in Example 4. More specifically, exon 2 is present in the sequence region of probe E, and 
exons 3 and 4 are present in the sequence region of probe F (see Fig. 4, c). Accordingly, it is believed that 
the gene rearrangements involving the 9.5 kb Hind III fragment region as described in Example 4 disrupt 
the normal exon structure in the region containing the three exons of the gene. In order to confirm this, the 
chromosomal DNAs from the tumor tissues of the above-described two breast cancers were examined by 

55 Southern-blot analysis using probes having DNA sequences corresponding to exons 2, 3 and 4. Thus, 
amplified bands of abnormal size were observed similarly to the previously described results obtained with 
probe E or F (see Fig. 7). 
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Example 8 Tissue specificity of gene expression 

mRNAs derived from various human tissues (brain, heart, kidney, liver, lung, pancreas, placenta, 
skeletal muscle, colon, peripheral blood lymphocyte, ovary, small intestine, spleen, testis and thymus) were 
5 examined by northern-blot analysis using the cDNA obtained in Example 5 as probe. As a result, the 
strongest expression was observed in the brain, and a weak expression in the heart, ovary and testis. 

Moreover, amplification by RT-PCR (reverse-transcriptase PCR) was performed to detect a weaker 
expression. Specifically, using random hexamers as primers, single-stranded cDNAs were synthesized from 
mRNAs derived from various human tissues under the action of reverse transcriptase. Then, PCR 
10 amplification from these templates was performed using primers BC09 and BC012 having sequences 
derived from the sequences of exons 21 and 23, respectively, which had been revealed in Example 6. As a 
result, a PCR product having the expected size was observed mainly in tissues of the central nervous 
system (cerebrum, cerebellum and fetal brain) and in endocrine or reproductive organs (testis, ovary, 
mammary gland, adrenal gland, thymus and pancreas). 
15 The sequences of the primers used are as follows: 

BC09 5'-GCACCTGCCCCGGCAGT-3* (SEQ ID NO:10) (coding strand, corresponding to base 

numbers 1764-1780 of SEQ ID NO:6) 
BC012 5'-CCAGGACAGCCCCAGCGATG-3' (SEQ ID NO:11) (antisense strand, corresponding to 
base numbers 1976-1957 of SEQ ID NO:6) 

20 

Example 9 Direct sequencing of mRNA by RT-PCR 



mRNAs derived from human fetal brain and human testis were amplified by RT-PCR using primer 
GMA701 having a sequence derived from the sequence on exon 19 and primer GMB704 having a 

25 sequence derived from the sequence on exon 21. Then, the base sequences of the amplified DNAs were 
directly determined using primer GMA702 or GMB703. As a result, a sequence, wherein 10 bases (base 
numbers 1512-1521) were deleted from the cerebellar cDNA sequence of SEQ ID NO:6 obtained in 
Example 5, was found, which revealed the expression of mRNA corresponding to the DNA sequence of 
SEQ ID NO:7. Both of the fetal brain and testis mRNAs gave the identical result. The open reading frame 

30 contained in the cDNA sequence of SEQ ID NO:7 encodes an MDC protein (see SEQ ID NO:3) composed 
of 670 amino acids. 

This seems to be caused by the alternative RNA splicing at the initiation of exon 20 which starts with 
base number 6083 instead of base number 6078 on the genomic DNA of SEQ ID NO:9. Such a variation of 
splicing is also known from, for example, a report by Oda et al. [Biochem. Biophys. Res. Commun., 193, 
35 897-904 (1993)]. As a result, the amino acid sequences encoded by the cDNA of SEQ ID NO:6 and the 
cDNA of SEQ ID NO:7 differ from each other at and after that site (see SEQ ID NO;2 and SEQ ID NO:3). 
Specifically, the cDNA of SEQ ID NO:6 produces a termination codon within exon 20, whereas the reading 
frame is shifted in the cDNA of SEQ ID NO:7 so as to cause the open reading frame to continue to a more 
downstream position. 

40 The sequences of the primers used in PCR and DNA sequencing are as follows: 

GMA701 S'-GGCTGCTGATCGCTTCTGCTAC-S* (SEQ ID NO:12) (coding strand, corresponding to 

base numbers 1413-1434 in SEQ ID NO:6) 
GMA702 5 f -GAGAAGCTGAATGTGGAGGG-3' (SEQ ID NO:13) (coding strand, corresponding to base 
numbers 1435-1456 in SEQ ID NO:6) 
45 GMB703 5 f -GTCAGAGCCGTCCGCCAGC-3' (SEQ ID NO:14) (antisense strand, corresponding to 
base numbers 1675-1657 in SEQ ID NO:6) 
GMB704 5 , -GCCATCCTCCACATAGCTCAGG-3 , (SEQ ID NO: 15) (antisense strand, corresponding to 
base numbers 1696-1655 in SEQ ID NO:6) 

so Example 10 Amplification of the S'-terminal sequence by RACE 

In order to obtain the full-length cDNA represented by SEQ ID NO:7, PCR amplification of the 5*-cDNA 
terminus (5*-RACE; Frohman, et al., Proc. Natl. Acad. Sci. USA, 85, 8998-9002, 1988; Belyavski, et al., 
Nucleic Acid Res., 17, 2919-2932, 1988) was performed. Using specific oligomer SGN012 as primer, 
55 together with a commercially available synthesis kit, a single-stranded cDNA was synthesized from 2 ug of 
poly A( + ) RNA derived from human brain (manufactured by Clontech). Then, 5*-RACE was performed 
using a commercially available kit based on the method of Edwards et al. for linking an anchor oligomer to 
an end of a single-stranded cDNA (Nucleic Acid Res., 19, 5227-5232, 1991). As a result of PCR using the 
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anchor oligomer of the kit and another specific oligomer SGN01 1 as primers, an amplification product of 
about 580 bp was detected by electrophoresis. 

This amplification product was extracted from the electrophoretic gel, purified, inserted in the Srf I 
cleavage site of plasmid vector pCR-Script (manufactured by Stratagene), and cloned. Plasmid DNA was 

5 purified from each clone and its base sequence was determined. One of the clones, pCR-5P2, had a cDNA 
insert of 501 bp beginning with ATG, next to the sequence of the anchor oligomer. The base sequence of 
the insert extending from base number 315 onward coincided exactly with the base sequence of SEQ ID 
NO:7 extending from base number 45 (the initiation site of exon 2) onward, excepting one base which will 
be mentioned below. Furthermore, as far as the reading frame is concerned, that of pCR-5P2 beginning 

w with the first ATG corresponded with the polypeptide encoded by the cDNA of SEQ ID NO:7. The N- 
terminal region of the polypeptide sequence thus obtained encoded a signal peptide comprising a series of 
hydrophobic amino acids. 

RT-PCR was performed in order to confirm that the above S'-terminal sequence obtained by S'-RACE 
was truly linked, on mRNA, to the sequence of SEQ ID NO:7 extending from base number 45 onward. 
75 Using random hexamers as primers, single-stranded cDNAs were synthesized from poly A( + ) RNAs 
derived from human brain, fetal brain, ovary and testis (manufactured by Clontech). Then, the cDNA 
template were amplified by PGR using an oligomer (SGN013) having the first 20-base sequence of pCR- 
5P2 as sense primer and SG NO VI or SGN01 2 as antisense primer As a result, the expected amplification 
product (about 500 bp for SGN013/SGN01 1 and about 750 bp for SGN013/SGN012) was detected by 
20 electrophoresis with every tissue RNA used. 

Thus, it was confirmed that the S'-terminal sequence of pCR-5P2 obtained by 5*-RACE was linked, on 
mRNA, to the sequence of SEQ ID NO:7 extending from base number 45 onward, resulting in the 
construction of a cDNA represented by SEQ ID NO:8. The open reading frame of the cDNA of SEQ ID NO:8 
encodes an MDC protein composed of 769 amino acids (see SEQ ID NO:4). 
25 The sequences of the specific oligomers used are as follows: 

SGN011 5*-GATGTAAGTCAAGTTCCCATCAGAGA-3' (SEQ ID NO:16) (antisense strand, correspond- 
ing to base numbers 231-206 in SEQ ID NO:7) 
SGN012 5'-AACAGCTGGTGGTCGTTGATCACAA-3* (SEQ ID NO:17) (antisense strand, correspond- 
ing to base numbers 485-461 in SEQ ID NO:7) 
30 SGN013 5'-ATGAGGCTGCTGCGGCGCTG-3' (SEQ ID NO: 18) (coding strand, corresponding to base 
numbers 1-20 in SEQ ID NO:8) 
The above-mentioned one base in the SEQ ID NO:8 after the initiation site of exon 2. differing from one 
in the SEQ ID NO:6 or SEQ ID NO:7, is the forth base from the initiation site of exon 2, i.e., the C at the 
base number 318 in the SEQ ID NO:8. The corresponding base in the SEQ ID NO:6 or the SEQ ID NO:7 is 
35 the A at the base number 48. The base C at the base number 318 in the SEQ ID NO:8 codes His at the 
amino acid number 106 in the SEQ ID NO:4. The base A at the base number 48 in the SEQ ID NO:6 or the 
SEQ ID NO:7 codes Gin at the amino acid number 7 in the SEQ ID NO:2 or the SEQ ID NO:3. This fact 
reflects polymorphism. 

An amino acid sequence common to these three variant MDC proteins (SEQ ID NO:2, SEQ ID NO:3 
40 and SEQ ID NO:4) is a sequence composed of 488 amino acids (see SEQ ID NO:l), and a DNA sequence 
encoding this portion is also a common sequence (see SEQ ID NO:5). 

Example 1 1 Homology with known proteins 

45 The amino acid sequences of MDC proteins showed homology with a family of snake venom 
hemorrhagic proteins including HR1B (Takeya et al., J. Biol. Chem., 265, 16068-16073, 1990), pro- 
rhodostomin (Au et al., Biochem. Biophys. Res. Commun., 181, 585-593, 1991) and protrigramin (Neeper et 
al., Nucleic Acid Res.. 18, 4255, 1990). 

They also showed homology with the guinea pig sperm surface protein PH30 (Blobel et al., Nature, 356, 

so 248-252, 1992) and the rat or monkey epididymis protein EAPI (Perry et al., Biochem. J., 286, 671-675, 
1992). 

The homology of these proteins with the MDC proteins represented by SEQ ID NO:2 (524 amino acids) 
and SEQ ID NO:4 (769 amino acids) is indicated by the following "percent identity/number of amino acids 
in the tested region". The values for SEQ ID NO:2 are given on the left side and those for SEQ ID NO:4 on 
55 the right side. 
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HR1B 


32.5/335 


32.2/379 


prorhodostomin 


29.0/420 


29.0/420 


protrigramin 


27.7/430 


28.1/438 


PH30b 


38.1/147 


30.8/302 


EAP1 (rat) 


36.0/364 


33.1/475 


EAP1 (monkey) 


30.4/503 


29.9/599 



io Example 12 Generation of transformants 

A DNA fragment encoding a part of the MDC protein represented by SEQ ID NO:2 was amplified from 
the DNA (SEQ ID NO:6) encoding the MDC protein (SEQ ID NO:2) by PCR using primers SGN006 and 
SGN008. The sequences of the primers used are as follows. 
75 SGN006 5'-CACAGATCTGGGGGCATATGCTCCCTG-3 f (SEQ ID NO:19) (coding strand, correspond- 
ing to base numbers 766-783 in SEQ ID NO:6) 
SGN008 5,-AACAAGCTTCTACTGATGTCTCCCACC-3* (SEQ ID NO:20) (antisense strand, corre- 
sponding to base numbers 1602-1585 in SEQ ID NO:6; the underline designating a 
termination codon.) 

20 For purposes of vector construction, the 5'-terminal of these primers are provided with Bgl II and Hind III 
cleavage site sequences, respectively. 

The PCR amplification product was separated by agarose gel electrophoresis and cleaved with Bgl II 
and Hind III. The resulting DNA fragments encoding a part of the MDC protein was combined with vector 
pMAL-c2 (manufactured by New England Biolabs) which had previously been cleaved with Bam HI and 
25 Hind III to construct plasmid pMAL-MDC(Cl). 

Similarly, the same DNA fragment was combined with vector pQE-13 (manufactured by Diagen) which 
had previously been cleaved with Bam HI and Hind III to construct plasmid pH6-MDC(Cl). 

Furthermore, a DNA sequence downstream from the Bam HI cleavage site (base number 1483 in SEQ 
ID NO:6) was removed from the MDC protein encoding region of pMAL-MDC(Cl) by cleaving pMAL-MDC- 
30 (C1) with Bam HI and Hind HI, and recombining it after the formation of blunt ends. This resulted in the 
construction of plasmid pMAL-MDC(dCl ), which mediates expression of a polypeptide with amino acid 
sequence common to two variant MDC proteins (SEQ ID NO:2 and SEQ ID NO:3). 

Since the fragment incorporated into vector pMAL-c2 is expressed as a fusion protein having a maltose- 
binding protein (MBP) on the N-terminal side, this fusion protein was purified by affinity chromatography 
35 using an amylose column. On the other hand, since the fragment incorporated into vector pQE-13 is 
expressed as a fusion protein having a peptide (His 6) composed of six histidine residues on the N-terminal 
side, this fusion protein was purified by affinity chromatography using a metal chelate column. 

Several transformants were obtained by transforming E. coli JM109 with each of plasmids pMAL-MDC- 
(C1), pMAL-MDC(dCl) and pH6-MDC(Cl) and selecting for ampicillin resistance. 

40 

Example 13 Expression and purification of recombinant MDC proteins 

Each of the transformants obtained in Example 12 was grown and the resulting recombinant MDC 
fusion protein was extracted and purified from the culture. 

45 Specifically, 100 ml of LB medium (1% polypeptone, 0.5% yeast extract, 1% NaCI) was inoculated with 

each transformant and incubated overnight at 37 °C with shaking. The culture was diluted 10-fold with LB 
medium previously warmed to 37 °C and incubated for additional 30-90 minutes to obtain a culture in the 
logarithmic growth phase. To 1 titer of the culture was added IPTG (isopropyl-0-D-thiogalactopyranoside) so 
as to give a final concentration of 1 mM. This culture was incubated for 3-4 hours and then centrifuged to 

so collect the cells therefrom. 

In the case of transformant of plasmid pMAL-MDC(Ci) or pMAL-MDC(dCI), the cells were suspended 
in 10 ml of a column buffer (20 mM Tris-HCl, pH 7.4, 200 mM NaCI) and disintegrated by sonication. Since 
the recombinant MDC fusion protein was present in the insoluble fraction of the disintegrated cell 
suspension, this was separated by centrifugation and dissolved in a denaturing buffer (8M urea, 20 mM 

55 Tris-HCl, pH 8.5, 10 mM dithiothreitol). Then, this solution was dialyzed against the column buffer and 
centrifuged to collect a supernatant soluble fraction. The dialyzed insoluble fraction was further denatured, 
dialyzed and centrifuged repeatedly to collect additional supernatant soluble fractions. The combined 
soluble fraction was applied to an amylose column (manufactured by New England Biolabs), which was 



20 



BNSDOCID: <EP 0633268A2> 



> 

EP 0 633 268 A2 



washed with the column buffer and eluted with the column buffer containing 10 mM maltose. The eluted 
fractions were analyzed by absorptiometry at 280 nm and SDS-polyacrylamide electrophoresis (with 
Coomassie Blue staining), and combined into fractions. As a result, a fraction in which the desired MBP 
(maltose binding protein) fusion protein (about 68 Kd) was detected as a principal band was obtained for 

5 each of the transformants generated with plasmids pMAL-MDC(Cl) and pMAL-MDC(dC1 ). The yield was 
46.4 mg and 10.0 mg (when an OD 2 so of 1 was taken as 1 mg/ml), respectively. These fusion proteins will 
hereinafter be referred to as MBP-MDC(CI) and MBP-MDC(dCl ), respectively. 

Similarly, in the case of transformant of plasmid pH6-MDC(Cl), the cells were suspended in 10 ml of a 
sonication buffer (10 mM sodium phosphate, pH 8.0, 200 mM NaCI) and disintegrated by sonication. Since 

w the recombinant MDC fusion protein was present in the insoluble fraction of the disintegrated cell 
suspension, this was separated by centrifugation and dissolved in buffer A (6M guanidine hydrochloride, 
100 mM NaH 2 P04, 10 mM Tris-HCI, PH 8.0). Then, this solution was centrifuged to collect a supernatant 
soluble fraction, which was applied to a Ni-NTA column (manufactured by Diagen). This column was 
washed with buffer A and then buffer B (8M urea, 100 mM NaH 2 PO*, 10 mM Trim-HCI, pH 8.0), and eluted 

15 stepwise with buffer C (8M urea, 100 mM NaH 2 PC>4, 10 mM Tris-HCI, pH 6.3), buffer D (8M urea, 100 mM 
NaH 2 PO*, 10 mM Tris-HCI, pH 5.9), buffer E (8M urea, 100 mM NaH 2 P04, 10 mM Tris-HCI, pH 4.5) and 
buffer F (6M guanidine hydrochloride, 200 mM acetic acid). The elated fractions were analyzed by 
absorptiometry at 280 nm and SDS-poiyacryiarnide electrophoresis (with Coomassie Blue staining), and 
combined into fractions. As a result, a fraction in which the desired His6 fusion protein (about 34 Kd) was 

20 detected as a single band was obtained from the effluent resulting from elution with buffer F. The yield was 
51.9 mg (when an OD280 of 1 was taken as 1 mg/ml). This fusion protein will hereinafter be referred to as 
His6-MDC(C1). 

Example 14 Preparation of a monoclonal antibody and a rabbit polyclonal antibody 

25 

The three recombinant fusion proteins, His6-MDC(C1), MBP-MDC(dCI) and MBP-MDC(CI), obtained in 
Example 13 were used as an immunizing antigen, an antigen for antibody purification and screening, and a 
standard antigen for measurement, respectively. 

An anti-MDC protein specific monoclonal antibody was prepared by immunizing a mouse with His6- 

30 MDC{C1). Specifically, a solution of His6-MDC(C1) (500-1000 ug/ml) in 3 M urea'PBS was mixed with 
complete adjuvant at a ratio of 1:1, and this mixture was injected into the peritoneal cavity of a mouse at a 
dose of 100 ug per animal. This injection was repeated 4-6 times at intervals of 2 weeks. After completion 
of the immunization, hybridomas were produced by fusing P3U1 cells with B cells in the presence of 
PEG 1500. Then, hybridomas productive of an anti-MDC protein specific antibody were selected by 

35 monitoring the antibody titer in the culture supernatant. 

In order to measure the antibody titer, a first reaction was effected by adding 100 ul of the culture 
supernatant to a polystyrene cup having a solid phase formed from the MBP-MDC(dCI) fusion protein 
obtained in Example 13 (5 ug/ml): After washing, a second reaction was effected by the addition of anti- 
mouse IgG HRP (horse-raddish peroxidase). After washing, a color reaction (third reaction) was effected by 

40 the addition of an enzyme substrate solution [i.e., a mixed solution of hydrogen peroxide and ABTS [2, 2'- 
azino-bis(3-ethylbenzothiazoline-6-sulfonic acid)]], and the produced color was monitored. 

The hybridomas were grown on a 96-well multi-plate and screened by means of HAT medium. After 
about 2 weeks, clones reacting specifically with the antigen were selected by measuring the antibody titer 
in the culture supernatant. As a result of further cloning, 3 clones (G1-5A2-2C8, G2-2F2-3D1 1 and G2-2D10- 

45 3F5) were established as antibody-producing hybridomas. The class and subclass of the antibody produced 
by each of the established clones was IgGi for G1-5A2-2C8, lgG 2b for G2-2F2-3D1 1, and IgM for G2-2D10- 
3F5. 3,000,000 cells of each hybridoma were introduced into the peritoneal cavity of a BALB/c mouse to 
which 0.5 ml of pristane had been administered intraperitoneal^ about one week before. After 8-10 days, 
the ascites was collected. From the ascites collected from each animal, an antibody was purified by affinity 

50 chromatography using a protein G column. 

Similarly, an anti-MDC protein polyclonal antibody was prepared by immunizing a rabbit with an 
immunizing antigen comprising His6-MDD(C1) obtained in Example 13. 

Specifically, like the mouse, a rabbit was immunized with a mixture prepared by mixing a solution of 
His6-MDC(C1) (500-1000 ug/ml) in 3 M urea/PBS with complete adjuvant at a ratio of 1:1. After completion 

55 of the immunization, an antiserum was obtained and its antibody titer was measured using a polystyrene 
cup having a solid phase formed from the MBP-MDC(dCl) fusion protein obtained in Example 13. The 
antiserum was diluted 500- to 64,000-fold, 100 ul each of the dilutions were added to wells, and their 
antibody titers were tested with goat anti-rabbit IgG-HRP. Thus, the antibody titer was detectable up to the 
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64,000-fold dilution- Since no antibody reacting with MBP-MDC(dCI) was present in the serum before 
immunization, it could be confirmed that an antibody reacting specifically with the protein was produced. 
Furthermore, this antiserum was purified by affinity chromatography using a protein G column and a 
Sepharose column having the MBP-MDC(dCI) fusion protein immobilized therein. 

s A method for the determination of the MDC protein by ELISA using the purified monoclonal antibody 

and purified rabbit polyclonal antibody obtained in the above-described manner was established. 

Specifically, the purified monoclonal antibody derived from a hybridoma (G2-2F2-3D1 1) was immo- 
bilized on a 96-well plate and blocked with BSA (bovine serum albumin). Test solutions containing purified 
MBP-MDC(CI) at concentrations of 0.156 to 5.00 ug/ml were prepared, added to wells in an amount of 100 

w ul per well, and reacted at room temperature for an hour. After the wells were washed, a solution (5 ug/ml) 
of the purified rabbit polyclonal antibody was added in an amount of 100 ul per well and reacted at room 
temperature for an hour. After the wells were washed, anti-rabbit IgG-HRP (5 ug/ml) was added in an 
amount of 100 ul per well and reacted at room temperature for an hour. After completion of the reaction, 2 
mM sodium azide was added in an amount of 100 ul per well and the absorbances at 405 nm and 490 nm 

75 were measured. It was confirmed that the differential absorbances thus obtained were closely correlated 
with the concentrations of the test solutions, exhibiting an approximately linear relationship in the range of 0 
to 2.5 ug/ml (see Fig. 8). This indicates that ELISA using these monoclonal antibody and rabbit polyclonal 
antibody can be used as a method for the determination of the MDC protein. 

20 
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SEQUENCE LISTING 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 488 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

( vi ) ORIGINAL SOURCE : 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

Leu Leu Set Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr 
1 5 10 15 

Thr Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys 
20 25 30 

Leu Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly 
35 40 45 

Leu His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr He Val Glu Pro 
50 55 60 

Gin Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His 
65 70 75 80 

Leu He Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu 
85 90 95 
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Pro Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg 
100 105 110 

Pro Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val 
115 120 125 

His Ser Glu Thr Lys Tyr Val Glu Leu He Val He Asn Asp His Gin 
130 135 140 

Leu Phe Glu Gin Net Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala 
145 150 155 160 

Lys Ser Val Val Asn Leu Ala Asp Val He Tyr Lys Glu Gin Leu Asn 
165 170 175 

Thr Arg He Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys 
180 185 190 

He Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val 
195 200 205 

Tyr Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe 
210 215 220 

Ser Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly 
225 230 235 240 

Gly He Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn 
245 250 255 

Met Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu 
260 265 270 

Gly Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys 
275 280 285 

Pro Asp He Trp Leu Gly Cys lie Met Glu Asp Thr Gly Phe Tyr Leu 
290 295 300 



Pro Arg Lys Phe Ser Arg Cys Ser He Asp Glu Tyr Asn Gin Phe Leu 
305 310 315 320 
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Gin Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu 

325 330 335 

Asp Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys 
340 345 350 

Asp Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys 
355 360 365 



Lys Lys Cys Thr Leu Thr His Asp 
370 375 

Cys Arg Arg Cys Lys Tyr Glu Pro 
385 390 

Val Asn Glu Cys Asp He Ala Glu 
405 

Cys Pro Pro Asn Leu His Lys Leu 



Ala Met Cys Ser Asp Gly Leu Cys 
380 

Arg Gly Vai Ser Cys Arg Glu Ala 

395 400 

Thr Cys Thr Gly Asp Ser Ser Gin 
410 415 

Asp Gly Tyr Tyr Cys Asp His Glu 
425 430 



Gin Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys 
435 440 445 

30 

Gin Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys 
450 455 460 

35 Leu Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser 

465 470 475 480 

Gly Trp Val Gin Cys Ser Lys Gin 
40 485 488 

(2) INFORMATION FOR SEQ ID NO: 2: 
45 (i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 524 amino acids 

(B) TYPE: amino acid 

50 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 

Met Cys Trp Leu Ser His Gin Leu Leu Ser Ser Gin Tyr Val Glu Arg 
15 10 15 

His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly Ala Gly Asp 
20 25 30 

His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His Ser Phe Ala 
35 40 45 

Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser Asp Gly Asn 
50 55 60 

Leu Thr Tyr He Val Glu Pro Gin Glu Val Ala Gly Pro Trp Gly Ala 
65 70 75 80 

Pro Gin Gly Pro Leu Pro His Leu He Tyr Arg Thr Pro Leu Leu Pro 
85 90 95 

Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala Val Pro Ala 
100 105 110 

Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys Arg Gin Val 
115 120 125 

Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr Val Glu Leu 
130 135 140 

He Val He Asn Asp His Gin Leu Phe Glu Gin Met Arg Gin Ser Val 
145 150 155 160 
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Val Leu Thr Ser Asn 
165 

He Tyr Lys Glu Gin 
180 

Thr Trp Ala Asp Gly 
195 

Thr Leu Ala Arg Leu 
210 

Scr Asn Ala Thr His 



Ser Gly Ala Ala Tyr 
245 

Gly Val Asn Glu Tyr 
260 

Gin Thr Leu Gly Gin 
275 

Ser Ala Gly Asp Cys 
290 

Glu Asp Thr Gly Phe 
305 

Asp Glu Tyr Asn Gin 
325 

Asn Lys Pro Leu Lys 
340 

Val Glu Ala Gly Glu 
355 

Arg Ala Gly Gly Asn 
370 
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Phe Ala Lys Ser Val Val 
170 

Leu Asn Thr Arg He Val 
185 

Asp Lys He Gin Val Gin 
200 

Met Val Tyr Arg Arg Glu 
215 

Leu Phe Ser Gly Arg Thr 
230 235 

Val Gly Gly He Cys Ser 
250 

Gly Asn Met Gly Ala Met 
265 

Asn Leu Gly Met Met Trp 
280 

Lys Cys Pro Asp He Trp 
295 

Tyr Leu Pro Arg Lys Phe 
310 315 

Phe Leu Gin Glu Gly Gly 
330 

Leu Leu Asp Pro Pro Glu 
345 

Glu Cys Asp Cys Gly Ser 
360 

Cys Cys Lys Lys Cys Thr 
375 



Asn Leu Ala Asp Val 
175 

Leu Val Ala Met Glu 
190 

Asp Asp Leu Leu Glu 
205 

Gly Leu Pro Glu Pro 
220 

Phe Gin Ser Thr Ser 
240 

Leu Ser His Gly Gly 
255 

Ala Val Thr Leu Ala 
270 

Asn Lys His Arg Ser 
285 

Leu Gly Cys He Met 
300 

Ser Arg Cys Ser He 
320 

Gly Ser Cys Leu Phe 
335 

Cys Gly Asn Gly Phe 
350 

Val Gin Glu Cys Ser 
365 

Leu Thr His Asp Ala 
380 
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Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr Glu Pro Arg 
385 390 395 400 

Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp lie Ala Glu Thr 
405 410 415 

Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His Lys Leu Asp 
420 425 430 

Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly Gly Arg Cys 
435 440 445 

Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His Ala Ala Ala 
450 455 460 

Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr Glu Arg Gly 
465 470 475 480 

Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser Lys Gin Pro 
485 490 495 

Gin Gin Gly Arg Ala Val Trp Leu Pro Pro Leu Cys Gin His Leu Trp 
500 505 510 

Ser Ser Ser Ala Arg Gly Pro Gly Gly Arg His Gin 
515 520 524 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 670 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Cys Trp Leu Ser His Gin Leu Leu Ser Ser Gin Tyr Val Glu Arg 
15 10 15 

His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly Ala Gly Asp 
20 25 30 

His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His Ser Phe Ala 
35 40 45 

Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser Asp Gly Asn 
50 55 60 

Leu Thr Tyr He Val Glu Pro Gin Glu Val Ala Gly Pro Trp Gly Ala 
65 70 75 80 

Pro Gin Gly Pro Leu Pro His Leu He Tyr Arg Thr Pro Leu Leu Pro 
85 90 95 

Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala Val Pro Ala 
100 105 no 

Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys Arg Gin Val 
115 120 125 

Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr Val Glu Leu 
130 135 140 

He Val He Asn Asp His Gin Leu Phe Glu Gin Met Arg Gin Ser Val 
145 150 155 160 

Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu Ala Asp Val 
165 170 175 

He Tyr Lys Glu Gin Leu Asn Thr Arg He Val Leu Val Ala Met Glu 
180 185 190 
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Thr Trp Ala Asp Gly Asp Lys He Gin Val Gin Asp Asp Leu Leu Glu 
195 200 205 

Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu Pro Glu Pro 
210 215 220 

Ser Asn Ala Thr His Leu Phe Ser Gly Arg Thr Phe Gin Ser Thr Ser 
225 230 235 240 

Ser Gly Ala Ala Tyr Val Gly Gly He Cys Ser Leu Ser His Gly Gly 
245 250 255 

Gly Val Asn Glu Tyr Gly Asn Met Gly Ala Met Ala Val Thr Leu Ala 
260 265 270 

Gin Thr Leu Gly Gin Asn Leu Gly Met Met Trp Asn Lys His Arg Ser 
275 280 285 

Ser Ala Gly Asp Cys Lys Cys Pro Asp He Trp Leu Gly Cys He Met 
290 295 300 

Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg Cys Ser He 
305 310 315 320 

Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser Cys Leu Phe 
325 330 335 

Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly Asn Gly Phe 
340 345 350 

Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin Glu Cys Ser 
355 360 365 

Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr His Asp Ala 
370 375 380 

Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr Glu Pro Arg 
385 390 395 400 



Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp He Ala Glu Thr 
405 410 415 
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Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His Lys Leu Asp 
420 425 430 

Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly Gly Arg Cys 
435 440 445 

Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His Ala Ala Ala 
450 455 460 

Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr Glu Arg Gly 
465 470 475 480 
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Val Leu Cys Gly Phe Leu Leu Cys Val Asn He Ser Gly Ala Pro Arg 
500 505 510 

Leu Gly Asp Leu Val Gly Asp He Ser Ser Val Thr Phe Tyr His Gin 
515 520 525 

Gly Lys Glu Leu Asp Cys Arg Gly Gly His Val Gin Leu Ala Asp Gly 
530 535 540 

Ser Asp Leu Ser Tyr Val Glu Asp Gly Thr Ala Cys Gly Pro Asn Met 
545 550 555 560 

Leu Cys Leu Asp His Arg Cys Leu Pro Ala Ser Ala Phe Asn Phe Ser 
565 570 575 

Thr Cys Pro Gly Ser Gly Glu Arg Arg He Cys Ser His His Gly Val 
580 585 590 

Cys Ser Asn Glu Gly. Lys Cys He Cys Gin Pro Asp Trp Thr Gly Lys 
595 600 605 

Asp Cys Ser He His Asn Pro Leu Pro Thr Ser Pro Pro Thr Gly Glu 
610 615 620 



Thr Glu Arg Tyr Lys Gly Pro Ser Gly Thr Asn lie lie lie Gly Ser 
625 630 635 640 
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He Ala Gly Ala Val Leu Val Ala Ala He Val Leu Gly Gly Thr Gly 
645 650 655 

Trp Gly Phe Lys Asn He Arg Arg Gly Arg Ser Gly Gly Ala 
660 665 670 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 769 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Arg Leu Leu Arg Arg Trp Ala Phe Ala Ala Leu Leu Leu Ser Leu 
1 5 io 15 

Leu Pro Thr Pro Gly Leu Gly Thr Gin Gly Pro Ala Gly Ala Leu Arg 
20 25 30 

Trp Gly Gly Leu Pro Gin Leu Gly Gly Pro Gly Ala Pro Glu Val Thr 
35 40 45 

Glu Pro Ser Arg Leu Val Arg Glu Ser Ser Gly Gly Glu Val Arg Lys 
50 55 60 

Gin Gin Leu Asp Thr Arg: Val Arg Gin Glu Pro Pro Gly Gly Pro Pro 
65 70 75 80 

Val His Leu Ala Gin Val Ser Phe Val He Pro Ala Phe Asn Ser Asn 
85 90 95 
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Phe Thr Leu Asp Leu Glu Leu Asn His His Leu Leu Ser Ser Gin Tyr 
100 105 HO 

Val Glu Arg His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly 
115 120 125 



10 



Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His 
130 135 140 



15 



Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser 
145 150 155 160 

Asp Gly Asn Leu Thr Tyr He Val Glu Pro Gin Glu Val Ala Gly Pro 
165 170 175 



20 



Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu He Tyr Arg Thr Pro 
180 185 190 



25 



Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala 
195 200 205 



30 



Val Pro Ala Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys 
210 215 220 

Arg Gin Val Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr 

225 230 235 240 



35 



Val Glu Leu lie Val He Asn Asp His Gin Leu Phe Glu Gin Met Arg 
245 250 255 



40 



Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu 
260 265 270 

Ala Asp Val He Tyr Lys Glu Gin Leu Asn Thr Arg He Val Leu Val 
275 280 285 



45 



Ala Met Glu Thr Trp Ala Asp Gly Asp Lys lie Gin Val Gin Asp. Asp 
290 295 300 



50 



Leu Leu Glu Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu 
305 310 315 320 



55 
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Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser Gly Arg Thr Phe Gin 
325 330 335 

Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly lie Cys Ser Leu Ser 
340 345 350 

His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met Gly Ala Met Ala Val 
355 360 365 

Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly Met Met Trp Asn Lys 
370 375 380 

His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro Asp lie Trp Leu Gly 
385 390 395 400 

Cys He Met Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg 
405 410 415 

Cys Ser He Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser 
420 425 430 

Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly 
435 440 445 

Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin 
450 455 460 

GGlu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr 
465 470 475 480 

His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr 
485 490 495 

Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp He 
500 505 510 

Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His 
515 520 525 



Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly 
530 535 540 
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Gly Arg Cys Lys 
545 

Ala Ala Ala Asp 



Glu Arg Gly Ser 
580 

Lys Gin Asp Val 
595 

ftia nu ni & i^cu 
610 

Tyr His Gin Gly 
625 

Ala Asp Gly Ser 



Pro Asn Met Leu 
660 

Asn Phe Ser Thr 
675 

His Gly Val Cys 
690 



Thr Arg Asp Arg 
550 

Arg Phe Cys Tyr 

565 

Cys Gly Arg Lys 



Leu Cys Gly Phe 
600 

r«l »» A r- *v T W*»1 

\J±J nop Lit U » UX 

Lys Glu Leu Asp 
630 

Asp Leu Ser Tyr 
645 

Cys Leu Asp His 



Cys Pro Gly Ser 
680 

Ser Asn Glu Gly 
695 



Gin Cys Gin Val 
555 

Glu Lys Leu Asn 
570 

Gly Ser Gly Trp 
585 

Leu Leu Cys Val 



Gly Asp lie Ser 
620 

Cys Arg Gly Gly 
635 

Val Glu Asp Gly 
650 

Arg Cys Leu Pro 
665 

Gly Glu Arg Arg 



Lys Cys He Cys 
700 



Leu Trp Gly His 
560 

Val Glu Gly Thr 
575 

Val Gin Cys Ser 
590 

Asn He Ser Gly 
605 



His Val Gin Leu 
640 

Thr Ala Cys Gly 
655 

Ala Ser Ala Phe 
670 

He Cys Ser His 
685 

Gin Pro Asp Trp 



Thr Gly Lys Asp Cys Ser He His 
705 710 

Thr Gly Glu Thr Glu Arg Tyr Lys 
725 

He Gly Ser He Ala Gly Ala Val 
740 

Gly Thr Gly Trp Gly Phe Lys Asn 
755 760 



Asn Pro Leu Pro Thr Ser Pro Pro 
715 720 

Gly Pro Ser Gly Thr Asn He He 
730 735 

Leu Val Ala Ala He Val Leu Gly 
745 750 

He Arg Arg Gly Arg Ser Gly Gly 
765 
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Ala 
769 

(2) INFORMATION FOR SEQ ID NO: 5: 
(1) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 1464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 
(ix) FEATURE 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 1464 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CTC CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA 48 
Leu Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr 
15 io " 15 



ACC CAG CAC AGC ACC GGG GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG 
Thr Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys- 
20 25 30 



96 



CTC CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG 144 
Leu Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly 
35 40 45 



50 



55 



36 



BNSOOCIO: <EP 063326SA2> 



+ 

EP 0 633 268 A2 



CTG CAT GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC 192 

Leu His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr He Val Glu Pro 
5 50 55 60 

CAA GAG GTG GCT GGA CCT TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC 240 

Gin Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His 
/o 65 70 75 80 

CTC ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA 288 

Leu He Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu 
85 90 95 
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CCA GGC TGC CTG TTT GCT GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG 336 
Pro Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg 
100 105 HO 

CCG AGG CTG AGA AGG AAA AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG 384 
Pro Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val 
115 120 125 

CAC AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG 432 
His Ser Glu Thr Lys Tyr Val Glu Leu He Val He Asn Asp His Gin 
130 135 140 

CTG TTC GAG CAG ATG CGA CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC 480 
Leu Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala 
145 150 155 160 

AAG TCC GTG GTG AAC CTG GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC 528 
Lys Ser Val Val Asn Leu Ala Asp Val He Tyr Lys Glu Gin Leu Asn 
165 170 175 

ACT CGC ATC GTC CTG GTT GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG 576 
Thr Arg He Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys 
180 185 190 

ATC CAG GTG CAG GAT GAC CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC 624 
He Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val 
195 200 205 
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TAC CGA CGG GAG GGT CTG CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC 672 

Tyr Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe 

210 215 220 

TCG GGC AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG 720 

Ser Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly 

225 230 235 240 

GGC ATA TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC 768 

Gly He Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn 

245 250 255 

ATG GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG 816 

Met Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu 

260 265 270 

GGC ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT 864 

Gly Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys 

275 280 285 

CCA GAC ATC TGG CTG GGC TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG 912 

Pro Asp He Trp Leu Gly Cys He Met Glu Asp Thr Gly Phe Tyr Leu 

290 295 300 

CCC CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG 960 

Pro Arg Lys Phe Ser Arg Cys Ser He Asp Glu Tyr Asn Gin Phe Leu 

305 310 315 320 

CAG GAG GGT GGT GGC AGC TGC CTC TTC .AAC AAG CCC CTC AAG CTC CTG 1008 

Gin Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu 

325 330 335 

GAC CCC CCA GAG TGC GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC 1056 

Asp Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys 

340 345 350 

GAC TGC GGC TCG GTG CAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC 1104 

Asp Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys 

355 360 365 
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AAG AAA TGC ACC CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC 1152 
Lys Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys 
370 375 380 

TGT CGC CGC TGC AAG TAC GAA CCA CGG GGT GTG TCc TGC CGA GAG GCC 1200 
Cys Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala 
385 390 395 400 

GTG AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG 1248 
Val Asn Glu Cys Asp He Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin 
405 410 415 

TGC CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG 1296 
Cys Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu 
420 425 430. 

CAG GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC 1344 
Gin Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys 
435 440 445 

CAG GTT CTT TGG GGC CAT GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG 1392 
Gin Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys 
450 455 460 

CTG AAT GTG GAG GGG ACG GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC 1440 
Leu Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser 
465 470 475 . 480 

GGC TGG GTC CAG TGC AGT AAG CAG 146 4 
Gly Trp Val Gin Cys Ser Lys Gin 
485 

(2) INFORMATION FOR SEQ ID NO: 6: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2923 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA to mRNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA libra 
(ix) FEATURE 

(A) NAME /KEY : 5' UTR 

(B) LOCATION: 1. .27 
(ix) FEATURE 

(A) NAME /KEY: 3* UTR 

(B) LOCATION: 1600.. 2923 
(ix) FEATURE 

(A) NAME /KEY : CDS 

(B) LOCATION: 28.. 1599 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GCGTTTACTG GCAAACCGCA TTTGTAA ATG TGC TGG CTG AGC CAC CAA CTC 

Met Cys Trp Leu Ser His Gin Leu 
1 5 

CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA ACC 
Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr Thr 
10 15 20 

CAG CAC AGC ACC GGG GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG CTC 
Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu 
25 30 35 40 

CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG CTG 
Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu 
45 50 55 



40 



10 



15 



20 



EP 0 633 268 A2 



CAT GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA 243 

His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr lie Val Glu Pro Gin 

60 65 70 

GAG GTG GCT GGA CCT TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC CTC 291 

Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu 

75 80 85 

ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA CCA 339 

lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro 

90 95 100 

GGC TGC CTG TTT GCT GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG CCG 387 

Gly Cys Leu Fhe Ala Val Pro Ala Gin Ser Ala Pro Fro Asa Arg Fro 

10'5 110 115 120 

AGG CTG AGA AGG AAA AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG CAC 435 

Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val His 

125 130 135 

AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG CTG 483 

Ser Glu Thr Lys Tyr Val Glu Leu He Val He Asn Asp His Gin Leu 

140 145 150 

TTC GAG CAG ATG CGA CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC AAG 531 

Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys 

155 160 165 

TCC GTG GTG AAC CTG GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC ACT 579 

Ser Val Val Asn Leu Ala Asp Val He Tyr Lys Glu Gin Leu Asn Thr 

170 175 180 

40 CGC ATC GTC CTG GTT GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC 627 

Arg He Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys He 

185 190 195 200 

4 5 CAG GTG CAG GAT GAC CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC 675 

Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val Tyr 

205 210 215 

50 
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CGA CGG GAG GGT CTG CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC TCG 723 
Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser 
220 225 230 

GGC AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC 771 
Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly 
235 240 245 

ATA TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC ATG 819 
He Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met 
250 255 260 

GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC 867 
Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly 
265 270 275 280 

ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT CCA 915 
Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro 
285 290 295 

GAC ATC TGG CTG GGC TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG CCC 963 
Asp He Trp Leu Gly Cys He Met Glu Asp Thr Gly Phe Tyr Leu Pro 
300 305 310 

CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG CAG 1011 
Arg Lys Phe Ser Arg Cys Ser lie Asp Glu Tyr Asn Gin Phe Leu Gin 
315 320 325 

GAG GGT GGT GGC AGC TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG GAC 1059 
Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp 
330 335 340 

CCC CCA GAG TGC GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC 1107 
Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp 
345 350 355' 360 

TGC GGC TCG GTG CAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG 1155 
Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys 
365 370 375 
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AAA TGC ACC CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT 1203 
Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys 
380 385 390 

CGC CGC TGC AAG TAC GAA CCA CGG GGT GTG TCc TGC CGA GAG GCC GTG 1251 
Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val 
395 400 405 

AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG TGC 1299 
Asn Glu Cys Asp lie Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys 
410 415 420 

CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG CAG 1347 
Pro Pro -A-s-n Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin 
425 430 435 440 

GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC CAG 1395 
Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin 
445 450 455 

GTT CTT TGG GGC CAT GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG CTG 1443 
Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu 
460 465 470 

AAT GTG GAG GGG ACG GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC GGC 1491 
Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly 
475 480 485 

TGG GTC CAG TGC AGT AAG CAG CCC CAA CAG GGA CGT GCT GTG TGG CTT 1539 
Trp Val Gin Cys Ser Lys Gin Pro Gin Gin Gly Arg Ala Val Trp Leu 
490 495 500 

CCT CCT CTG TGT CAA CAT CTC TGG AGC TCC TCG GCT AGG GGA CCT GGT 1587 
Pro Pro Leu Cys Gin His Leu Trp Ser Ser Ser Ala Arg Gly Pro Gly 
505 510 515 520 

GGG AGA CAT CAG . TAGTGTCACC TTCTACCACC AGGGCAAGGA GCTGGACTGC 1639 
Gly Arg His Gin 
524 

AGGGGAGGCC ACGTGCAGCT GGCGGACGGC TCTGACCTGA GCTATGTGGA GGATGGCACA 1699 
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GCCTGCGGGC CTAACATGTT GTGCCTGGAC CATCGCTGCC TGCCAGCTTC TGCCTTCAAC 1759 
TTCAGCACCT GCCCCGGCAG TGGGGAGCGC CGGATTTGCT CCCACCACGG GGTCTGCAGC 1819 
AATGAAGGGA AGTGCATCTG TCAGCCAGAC TGGACAGGCA AAGACTGCAG TATCCATAAC 1879 
CCCCTGCCCA CGTCCCCACC CACGGGGGAG ACGGAGAGAT ATAAAGGTCC CAGCGGCACC 1939 
AACATCATCA TTGGCTCCAT CGCTGGGGCT GTCCTGGTTG CAGCCATCGT CCTGGGCGGC 1999 
ACGGGCTGGG GATTTAAAAA CATTCGCCGA GGAAGGTCCG GAGGGGCCTA AGTGCCACCC 2059 
TCCTCCCTCC AAGCCTGGCA CCCACCGTCT CGGCCCTGAA CCACGAGGCT GCCCCCATCC 2119 
AGCCACGGAG GGAGGCACCA TGCAAATGTC TTCCAGGTCC AAACCCTTCA ACTCCTGGCT 2179 
CCGCAGGGGT TTGGGTGGGG GCTGTGGCCC TGCCCTTGGC ACCACCAGGG TGGACCAGGC 2239 
CTGGAGGGCA CTTCCTCCAC AGTCCCCCAC CCACCTCCTG CGGCTCAGCC TTGCACACCC 2299 
ACTGCCCCGT GTGAATGTAG CTTCCACCTC ATGGATTGCC ACAGCTCAAC TCGGGGGCAC 2359 
CTGGAGGGAT GCCCCCAGGC AGCCACCAGT GGACCTAGCC TGGATGGCCC CTCCTTGCAA 2419 
CCAGGCAGCT GAGACCAGGG TCTTATCTCT CTGGGACCTA GGGGGACGGG GCTGACATCT 2479 
ACATTTTTTA AAACTGAATC TTAATCGATG AATGTAAACT CGGGGGTGCT GGGGCCAGGG 2539 
CAGATGTGGG GATGTTTTGA CATTTACAGG AGGCCCCGGA GAAACTGAGG TATGGCCATG 2599 
CCCTAGACCC TCCCCAAGGA TGACCACACC CGAAGTCCTG TCACTGAGCA CAGTCAGGGG 2659 
CTGGGCATCC CAGCTTGCCC CCGCTTAGCC CCGCTGAGCT TGGAGGAAGT ATGAGTGCTG 2719 
ATTCAAACCA AAGCTGCCTG TGCCATGCCC AAG6CCTAGG TTATGGGTAC GGCAACCACA 2779 
TGTCCCAGAT CGTCTCCAAT TCGAAAACAA CCGTCCTGCT GTCCCTGTCA GGACACATGG 2839 
ATTTTGGCAG GGCGGGGGGG GGTTCTAGAA AATATAGGTT CCTATAATAA AATGGCACCT 2899 
TCCCCCTTTA AAAAAAAAAA AAAA 2923 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2913 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 
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(ix) FEATURE 

(A) NAME /KEY : 5' UTR 

5 

(B) LOCATION: 1 . . 27 
(ix) FEATURE 

'0 (A) NAME/KEY : 3* UTR 

(B) LOCATION: 2038.. 2913 
(ix) FEATURE 

15 

(A) NAME /KEY: CDS 

( B ) LG CAT ION : 28.. 2837 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GCGTTTACTG GCAAACCGCA TTTGTAA ATG TGC TGG CTG AGC CAC CAA CTC 51 

Met Cys Trp Leu Ser His Gin Leu 
1 5 
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CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA ACC 99 
Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg- Glu Gly Thr Thr 
10 15 20 

CAG CAC AGC ACC GGG GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG CTC 147 
Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu 
25 30 35 40 

CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG CTG 195 
Arg- Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu 
45 50 55 

CAT GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA 243 
His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr He Val Glu Pro Gin 
60 65 70 

GAG GTG GCT GGA CCT TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC CTC 291 
Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu 
75 80 85 
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ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA CCA 
lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro 
90 95 100 



339 



GGC TGC CTG TTT GCT GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG CCG 
Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg Pro 
105 110 115 120 
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AGG CTG AGA AGG AAA AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG CAC 435 

Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val His 
125 130 135 

AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG CTG 483 

Ser Glu Thr Lys Tyr Val Glu Leu He Val He Asn Asp His Gin Leu 
140 145 150 

TTC GAG CAG ATG CGA CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC AAG 531 

Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys 

155 160 165 

TCC GTG GTG AAC CTG GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC ACT 579 

Ser Val Val Asn Leu Ala Asp Val He Tyr Lys Glu Gin Leu Asn Thr 
170 175 180 

CGC ATC GTC CTG GTT GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC 627 

Arg He Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys He 
185 190 195 200 

CAG GTG CAG GAT GAC CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC 675 

Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Net Val Tyr 
205 210 215 

CGA CGG GAG GGT CTG CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC TCG 723 

Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser 
220 225 230 

GGC AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC 771 

Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly 

235 240 245 
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ATA TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC ATG 819 
lie Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met 
250 255 260 

GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC 867 
Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly 
265 270 275 280 

ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT CCA 915 
Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro 
285 290 295 

GAC ATC TGG CTG GGC TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG CCC 963 
Asp lie Trp Leu Gly Cys He Met Glu Asp Thr Gly Phe Tyr Leu Pro 
300 305 310 

CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG CAG 1011 
Arg Lys Phe Ser Arg Cys Ser He Asp Glu Tyr Asn Gin Phe Leu Gin 
315 320 325 

GAG GGT GGT GGC AGC TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG GAC 1059 
Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp 
330 335 340 

CCC CCA GAG TGC GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC 1107 
Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp 
345 350 355 360 

TGC GGC TCG GTG CAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG 1155 
Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys 
365 370 375 

AAA TGC ACC CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT 1203 
Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys 
380 385 390 

CGC CGC TGC AAG TAC GAA CCA CGG GGT GTG TCc TGC CGA GAG GCC GTG 1251 
Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val 
395 400 405 
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AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG TGC 1299 
Asn Glu Cys Asp He Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys 
410 415 420 

CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG CAG 1347 
Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin 
425 430 435 440 

GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC CAG 1395 
Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin 
445 450 455 

GTT CTT TGG GGC CAT GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG CTG 1443 
Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu 
460 465 470 

AAT GTG GAG GGG ACG GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC GGC 1491 
Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly 
475 480 485 

TGG GTC CAG TGC AGT AAG CAG GAC GTG CTG TGT GGC TTC CTC CTC TGT 1539 
Trp Val Gin Cys Ser Lys Gin Asp Val Leu Cys Gly Phe Leu Leu Cys 
490 495 500 

GTC AAC ATC TCT GGA GCT CCT CGG CTA GGG GAC CTG GTG GGA GAC ATC 1587 
Val Asn He Ser Gly Ala Pro Arg Leu Gly Asp Leu Val Gly Asp He 
505 510 515 520 

AGT AGT GTC ACC TTC TAC CAC CAG GGC AAG GAG CTG GAC TGC AGG GGA 1635 
Ser Ser Val Thr Phe Tyr His Gin Gly Lys Glu Leu Asp Cys Arg Gly 
525 530 535 

GGC CAC GTG CAG CTG GCG GAC GGC TCT GAC CTG AGC TAT GTG GAG GAT 1683 
Gly His Val Gin Leu Ala Asp Gly Ser Asp Leu Ser Tyr Val Glu Asp 
540 545 550 

45 . GGC ACA GCC TGC GGG CCT AAC ATG TTG TGC CTG GAC CAT CGC TGC CTG 1731 

Gly Thr Ala Cys Gly Pro Asn Met Leu Cys Leu Asp His Arg Cys Leu 
555 560 565 
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CCA GCT TCT GCC TTC AAC TTC AGC ACC TGC CCC GGC AGT GGG GAG CGC 1779 
Pro Ala Ser Ala Phe Asn Phe Ser Thr Cys Pro Gly Ser Gly Glu Arg 
570 575 580 

CGG ATT TGC TCC CAC CAC GGG GTC TGC AGC AAT GAA GGG AAG TGC ATC 1827 
Arg He Cys Ser His His Gly Val Cys Ser Asn Glu Gly Lys Cys He 
585 590 595 600 

TGT CAG CCA GAC TGG ACA GGC AAA GAC TGC AGT ATC CAT AAC CCC CTG 1875 
Cys Gin Pro Asp Trp Thr Gly Lys Asp Cys Ser He His Asn Pro Leu 
605 610 615 

GGT CCC AGC 1923 
Gly Pro Ser 
630 

GGC ACC AAC ATC ATC ATT GGC TCC ATC GCT GGG GCT GTC CTG GTT GCA 1971 
Gly Thr Asn He He He Gly Ser He Ala Gly Ala Val Leu Val Ala 
635 640 645 

GCC TAG GTC CTG GGC GGC ACG GGC TGG GGA TTT AAA AAC ATT CGC CGA 2019 
Ala He Val Leu Gly Gly Thr Gly Trp Gly Phe Lys Asn He Arg Arg 
650 655 660 

GGA AGG TCC GGA GGG GCC TAAGTGCCAC CCTCCTCCCT CCAAGCCTGG 2067 
Gly Arg Ser Gly Gly Ala 
665 670 

CACCCACCGT CTCGGCCCTG AACCACGAGG CTGCCCCCAT CCAGCCACGG AGGGAGGCAC 2127 
CATGCAAATG TCTTCCAGGT CCAAACCCTT CAACTCCTGG CTCCGCAGGG GTTTGGGTGG 2187 
GGGCTGTGGC CCTGCCCTTG GCACCACCAG GGTGGACCAG GCCTGGAGGG CACTTCCTCC 2247 
ACAGTCCCCC ACCCACCTCC TGCGGCTCAG CCTTGCACAC CCACTGCCCC GTGTGAATGT 2307 
AGCTTCCACC TCATGGATTG CCACAGCTCA ACTCGGGGGC ACCTGGAGGG ATGCCCCCAG 2367 
GCAGCCACCA GTGGACCTAG. CCTGGATGGC CCCTCCTTGC AACCAGGCAG CTGAGACCAG 2427 
GGTCTTATCT CTCTGGGACC TAGGGGGACG GGGCTGACAT CTACATTTTT TAAAACTGAA 2487 
TCTTAATCGA TGAATGTAAA CTCGGGGGTG CTGGGGCCAG GGCAGATGTG GGGATGTTTT 2547 
GACATTTACA GGAGGCCCCG GAGAAACTGA GGTATGGCCA TGCCCTAGAC CCTCCCCAAG 2607 
GATGACCACA CCCGAAGTCC TGTCACTGAG CACAGTCAGG GGCTGGGCAT CCCAGCTTGC 2667 
CCCCGCTTAG CCCCGCTGAG CTTGGAGGAA GTATGAGTGC TGATTCAAAC CAAAGCTGCC 2727 
TGTGCCATGC CCAAGGCCTA GGTTATGGGT ACGGCAACCA CATGTCCCAG ATCGTCTCCA 2787 
ATTCGAAAAC AACCGTCCTG CTGTCCCTGT CAGGACACAT GGATTTTGGC AGGGCGGGGG 2847 



CCC ACG TCC CCA CCC ACG GGG GAG AGG GAG AGA TAT AAA 
Pro Thr Ser Pro Pro T-hr Gly Glu Thr Glu Arg Tyr Lys 
620 625 
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GGGGTTCTAG AAAATATAGG TTCCTATAAT AAAATGGCAC CTTCCCCCTT TAAAAAAAAA 2907 
AAAAAA 2gi3 

(2) INFORMATION FOR SEQ ID NO : 8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 3183 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) INTERMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 
(ix) FEATURE 

(A) NAME/KEY: 3* UTR 

(B) LOCATION: 2308.. 3183 
(ix) FEATURE 

(A) NAME/ KEY : CDS 

(B) LOCATION: 1 . . 2307 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

ATG AGG CTG CTG CGG CGC TGG GCG TTC GCG GCT CTG CTG CTG TCG CTG 48 
Met Arg Leu Leu Arg- Arg Trp Ala Phe Ala Ala Leu Leu Leu Ser Leu 
1 5 10 15 

CTC CCC ACG CCC GGT CTT GGG ACC CAA GGT ccT GCT GGA GCT CTG Cga 96 
Leu Pro Thr Pro Gly Leu Gly Thr Gin Gly Pro Ala Gly Ala Leu Arg 
20 25 30 
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TGG GGG GGC TTA CCC CAG CTG GGA GGC CCA GGA GCC CCT GAG GTC ACG 144 
Trp Gly Gly Leu Pro Gin Leu Gly Gly Pro Gly Ala Pro Glu Val Thr 
35 40 45 

GAA CCC AGC CGT CTG GTT AGG GAG AGC TCC GGG GGA GAG GTC CGA AAG 192 
Glu Pro Ser Arg Leu Val Arg Glu Ser Ser Gly Gly Glu Val Arg Lys 
50 .55 60 

CAG CAG CTG GAC ACA AGG GTC CGC CAG GAG CCA CCA GGG GGC CCG CCT 240 
Gin Gin Leu Asp Thr Arg Val Arg Gin Glu Pro Pro Gly Gly Pro Pro 
65 70 75 80 

CT C PAT CTCl n nn rTT> *rT TTP r*Tr* »Tr> r-*t~< * r-, rr*rwr* « * . . „ n n ^ 

vjiu v-iu uv/t. orvvj uiu nui l i ^ uiv, n i o v_,\^n vjvo no l i^n /U\U 2SS 

Val His Leu Ala Gin Val Ser Phe Val lie Pro Ala Phe Asn Ser Asn 
85 90 95 

TTC ACC CTG GAC CTG GAG CTG AAC CAC CAc CTC CTC TCC TCG CAA TAC 336 
Phe Thr Leu Asp Leu Glu Leu Asn His His Leu Leu Ser Ser Gin Tyr 
100 105 HO 

GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA ACC CAG CAC AGC ACC GGG 384 
Val Glu Arg His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly 
115 120 125 

GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG CTC CGG GGG AAC CCG CAC 432 
Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His 
130 135 140 

TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG CTG CAT GGG GTC TTC TCT 480 
Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser 
145 150 155 160 

GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA GAG GTG GCT GGA CCT 528 
Asp Gly Asn Leu Thr Tyr He Val Glu Pro Gin Glu Val Ala Gly Pro 
165 170 175 

TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC CTC ATT TAC CGG ACC CCT 576 
Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu He Tyr Arg Thr Pro 
180 185 190 
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CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA CCA GGC TGC CTG TTT GCT 624 
Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala 
195 200 205 

GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG CCG AGG CTG AGA AGG AAA 672 
Val Pro Ala Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys 
210 215 220 

AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG CAC AGT GAA ACC AAG TAT 720 
Arg Gin Val Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr 
225 230 235 240 

GTG GAG CTA ATT GTG ATC .AAC GAC CAC CAG CTG TTC GAG CAG ATG CGA 768 
Val Glu Leu He Val He Asn Asp His Gin Leu Phe Glu Gin Met Arg 
245 250 255 

CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC AAG TCC GTG GTG AAC CTG 816 
Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu 
260 265 270 

GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC ACT CGC ATC GTC CTG GTT 864 
Ala Asp Val He Tyr Lys Glu Gin Leu Asn Thr Arg He Val Leu Val 
275 280 285 

GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC CAG GTG CAG GAT GAC 912 
Ala Met Glu Thr Trp Ala Asp Gly Asp Lys He Gin Val Gin Asp Asp 
290 295 300 

CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC CGA CGG GAG GGT CTG 960 
Leu Leu Glu Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu 
305 310 315 320 

CCT. GAG CCC AGT AAT GCC ACC CAC CTC TTC TCG GGC AGG ACC TTC CAG 1008 
Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser Gly Arg Thr Phe Gin 
325 330 335 

AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC ATA TGC TCC CTG TCC 1056 
Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly lie Cys Ser Leu Ser 
340 345 350 
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CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC ATG GGG GCG ATG GCC GTG 1104 
His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met Gly Ala Met Ala Val 
355 360 365 

ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC ATG ATG TGG AAC AAA 1152 
Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly Met Met Trp Asn Lys 
370 375 380 

CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT CCA GAC ATC TGG CTG GGC 1200 
His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro Asp He Trp Leu Gly 
385 390 395 400 

TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG CCC CGC AAG TTC TCT CGC 1248 
Cys He Met Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg 
405 410 415 

TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG CAG GAG GGT GGT GGC AGC 1296 
Cys Ser He Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser 
420 425 430 

TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG GAC CCC CCA GAG TGC GGG 1344 
Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly 
435 440 445 

AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC TGC GGC TCG GTG CAG 1392 
Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin 
450 455 460 

GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG AAA TGC ACC CTG ACT 1440 
Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr 
465 470 475 480 

CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT CGC CGC TGC AAG TAC 1488 
His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr 
485 490 495 

GAA CCA CGG GGT GTG TCC TGC CGA GAG GCC GTG AAC GAG TGC GAC ATC 1536 
Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp He 
500 505 510 
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GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG TGC CCG CCT AAC CTG CAC 1584 
Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His 
515 520 525 

AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG CAG GGC CGC TGC TAC GGA 1632 
Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly 
530 535 540 

GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC CAG GTT CTT TGG GGC CAT 1680 
Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His 
545 550 555 560 

GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG CTG AAT GTG GAG GGG ACG 1728 
Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr 
565 570 575 

GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC GGC TGG GTC CAG TGC AGT 1776 
Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser 
580 585 590 

AAG CAG GAC GTG CTG TGT GGC TTC CTC CTC TGT GTC AAC ATC TCT GGA 1824 
Lys Gin Asp Val Leu Cys Gly Phe Leu Leu Cy*s Val Asn He Ser Gly 
595 600 605 

GCT CCT CGG CTA GGG GAC CTG GTG GGA GAC ATC AGT AGT GTC ACC TTC 1872 
Ala Pro Arg Leu Gly Asp Leu Val Gly Asp He Ser Ser Val Thr Phe 
610 615 620 

TAC CAC CAG GGC AAG GAG CTG GAC TGC AGG GGA GGC CAC GTG CAG CTG 1920 
Tyr His Gin Gly Lys Glu Leu Asp Cys Arg Gly Gly His Val Gin Leu 
625 . 630 635 640 

GCG GAC GGC TCT GAC CTG AGC TAT GTG GAG GAT GGC ACA GCC TGC GGG 1968 
Ala Asp Gly Ser Asp Leu Ser Tyr Val Glu Asp Gly Thr Ala Cys Gly 
645 650 655 

CCT AAC ATG TTG TGC CTG GAC CAT CGC TGC CTG CCA GCT TCT GCC TTC 2016 
Pro Asn Net Leu Cys Leu Asp His Arg Cys Leu Pro Ala Ser Ala Phe 
660 665 670 
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AAC TTC AGC ACC TGC CCC GGC AGT GGG GAG CGC CGG ATT TGC TCC CAC 2064 
Asn Phe Ser Thr Cys Pro Gly Ser Gly Glu Arg Arg He Cys Ser His 
675 680 685 



CAC GGG GTC TGC AGC AAT GAA GGG AAG 
His Gly Val Cys Ser Asn Glu Gly Lys 
690 695 

ACA GGC AAA GAC TGC AGT ATC CAT AAC 
Thr Gly Lys Asp Cys Ser He His Asn 
705 710 

ACG GGG GAG ACG GAG AGA TAT AAA GGT 
Thr Gly Glu Thr Glu Arg Tyr Lys Gly 
725 



TGC ATC TGT CAG CCA GAC TGG 2112 

Cys He Cys Gin Pro Asp Trp 
700 

CCC CTG CCC ACG TCC CCA CCC 2160 

Pro Leu Pro Thr Ser Pro Pro 

715 720 

CCC AGC GGC ACC AAC ATC ATC 2208 
Pro Ser Gly Thr Asn lie He 
730 735 



ATT GGC TCC ATC GCT GGG GCT GTC CTG GTT GCA GCC ATC GTC CTG GGC 2256 
He Gly Ser He Ala Gly Ala Val Leu Val Ala Ala lie Val Leu Gly 
740 745 750 



GGC ACG GGC TGG GGA TTT AAA AAC ATT CGC CGA GGA AGG TCC GGA GGG 2304 
Gly Thr Gly Trp Gly Phe Lys Asn He Arg Arg Gly Arg Ser Gly Gly 
755 760 765 



GCC TAAGTGCCAC CCTCCTCCCT CCAAGCCTGG CACCCACCGT CTCGGCCCTG 2357 

Ala 

769 



AACCACGAGG CTGCCCCCAT CCAGCCACGG 
CCAAACCCTT CAACTCCTGG CTCCGCAGGG 
GCACCACCAG GGTGGACCAG GCCTGGAGGG 
. TGCGGCTCAG CCTTGCACAC CCACTGCCCC 
CCACAGCTCA ACTCGGGGGC ACCTGGAGGG 
CCTGGATGGC CCCTCCTTGC AACCAGGCAG 
TAGGGGGACG GGGCTGACAT CTACATTTTT 
CTCGGGGGTG CTGGGGCCAG GGCAGATGTG 
GAGAAACTGA GGTATGGCCA TGCCCTAGAC 
TGTCACTGAG CACAGTCAGG GGCTGGGCAT 
CTTGGAGGAA GTATGAGTGC TGATTCAAAC 
GGTTATGGGT ACGGCAACCA CATGTCCCAG 
CTGTCCCTGT CAGGACACAT GGATTTTGGC 



AGGGAGGCAC CATGCAAATG TCTTCCAGGT 2417 
GTTTGGGTGG GGGCTGTGGC CCTGCCCTTG 2477 
CACTTCCTCC ACAGTCCCCC ACCCACCTCC 2537 
GTGTGAATGT AGCTTCCACC TCATGGATTG 2597 
ATGCCCCCAG GCAGCCACCA GTGGACCTAG 2657 
CTGAGACCAG GGTCTTATCT CTCTGGGACC 2717 
TAAAACTGAA TCTTAATCGA TGAATGTAAA 2777 
GGGATGTTTT GACATTTACA GGAGGCCCCG 2837 
CCTCCCCAAG GATGACCACA CCCGAAGTCC 2897 
CCCAGCTTGC CCCCGCTTAG CCCCGCTGAG 2957 
CAAAGCTGCC TGTGCCATGC CCAAGGCCTA 3017 
ATCGTCTCCA ATTCGAAAAC AACCGTCCTG 3077 
AGGGCGGGGG GGGGTTCTAG AAAATATAGG 3137 
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TTCCTATAAT AAAATGGCAC CTTCCCCCTT TAAAAAAAAA AAAAAA 3183 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 9278 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(vl) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
(vii) INTERMEDIATE SOURCE: 
25 (A) LIBRARY: human DNA cosmid library 

(ix) FEATURE 

(A) NAME /KEY : exon 1 

30 

(B) LOCATION : 28 . . 44 
(ix) FEATURE 

as (A) NAME/KEY: exon 2 

(B) LOCATION: 308 .. 374 
(ix) FEATURE 

40 

(A) NAME/KEY: exon 3 

(B) LOCATION: 909.. 994 
«5 (ix) FEATURE 

(A) NAME/KEY: exon 4 

(B) LOCATION: 1081.. 1156 
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(ix) 

5 



(ix) 

10 



is (ix) 



20 

(ix) 



25 

(ix) 

30 

( ix) 

35 



( ix) 

40 



(ix) 



FEATURE 

(A) NAME /KEY: exon 5 

(B) LOCATION: 1591.. 1657 
FEATURE 

(A) NAME/KEY: exon 6 

(B) LOCATION: 1725.. 1792 
FEATURE 

(A) NAME /KEY": exon 7 

(B) LOCATION: 2182.. 2256 
FEATURE 

(A) NAME/KEY: exon 8 

(B) LOCATION: 2339.. 2410 
FEATURE 

(A) NAME /KEY : exon 9 

(B) LOCATION: 2588 .. 2754 
FEATURE 

(A) NAME /KEY : exon 10 

(B) LOCATION: 3248.-3332 
FEATURE 

(A) NAME/KEY: exon 11 

(B) LOCATION: 3445.. 3535 
FEATURE 

(A) NAME/KEY: exon 12 

(B) LOCATION: 3645.. 3696 
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(ix) FEATURE 

(A) NAME /KEY: 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME /KEY : 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME/KEY: 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME /KEY : 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME /KEY: 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME /KEY : 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME/KEY: 

(B) LOCATION: 
(ix) FEATURE 

(A) NAME/KEY: 

(B) LOCATION: 



exon 13 
4014. .4113 

exon 14 
4196. .4267 

exon 15 
4386. .4478 

exon 16 
4920 . . 5000 

exon 17 
5347 . . 5397 

exon 18 
5501 . . 5564 

exon 19 
5767 . . 5866 

exon 20 
6073 . . 6202 
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(ix) FEATURE 

(A) NAME /KEY : exon 21 

(B) LOCATION: 6300.. 6468 
(ix) FEATURE 

(A) NAME/KEY: exon 22 

.(B) LOCATION: 6557.. 6671 
(ix) FEATURE 

(A) NAME/KEY: exon 23 

(B) LOCATION: 6756.. 6846 
20 (ix) FEATURE 

(A) NAME/KEY: exon 24 

(B) LOCATION: 7829.. 7846 
25 (ix) FEATURE 

(A) NAME/KEY: exon 25 

30 (B) LOCATION : 8165.. 9038 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GCGTTTACTG GCAAACCGCA TTTGTAA ATG TGC TGG CTG AGC CA NNNNNNNNNN 54 
35 Met Cys Trp Leu Ser His 

1 5 

NNNNCCAGGT GAGTTTCGTC ATCCAGCCTT CAACTCAAAC TTCACCCTGG ACCTGGAGCT 114 

40 GAACCAGTGA GNGTGGCCTT GAGCCCAAGA GGAAGGGCAG TGGTGGNNNG GGGGAGACAT 174 

GGCTAGGGCC TGGCTGCTGG GGGTCTGGGG GTTGGGCCTG GCGAGAGGGG ACCTGGGTCC 234 

TGACCTGAGG CGAGCCTAAA . GCCCGACCTC ACCTCGCCCG TGACCCCCCT TCCTGCTGCC 294 

45 CCCTCTGTCT CAG C CAA CTC CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC 344 

Gin Leu Leu Ser Ser Gin Tyr Val Glu Arg His Phe 
10 15 



50 



55 
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AGC CGG GAG GGG ACA ACC CAG CAC AGC ACC GTGAGTGCCA CTGCTGGGGA 394 
Ser Arg: Glu Gly Thr Thr Gin His Ser Thr 
20 25 



CCGGGGCCGG GGATGGAAGG GAGGTGCTGT TTCTGTGGTT CTGTGGTCAC AGGTGTAGGG 454 

ACAGGTGGCC ACTGGAGATG GGGTCCTGGG CCTGGCCCCT CAGCACCTTC CCTCTCTCCC 514 

GACCCAGGAG GCTCTGAGGG TGGACAGTGG GCAGCTTAGT GCATAGGGCC CTGAAGTCCC 574 

CTCACTTGGC CCCAGAGCTC TGACCCCCAG CCAGCCCACG TGGGGCCTAC AGGGACACTC 634 

GTTCCGAGCA GGCTGCCAGG ATCCNNNNNN NNNNNNATAG ATGACGTGAA GGAGGCCCAG 694 

AGGTTCCTAA CCCCAGAGGG CTAGGAACTT GCCCAGGGTG GCACGGCAAA TTAGGAGCAC 754 

CAGCCATCTA GAAACAGGCT CCAGAGCCCC AGGNATACCC AGGGATNGTG GCCACCTGCA 814 

CACAGGGCAG CTTCAGTGTC CCCCAAAAAG CCTTGAGGCC CATTGGCTGC CCCCGGCCTC 874 

ATGCCAGCGT TCTGCTCACT GTTCTGCTCC TTAG GGG GCT GGA GAC CAC TGC TAC 929 

Gly Ala Gly Asp His Cys Tyr 
30 35 

TAC CAG GGG AAG CTC CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC 
Tyr Gin Gly Lys Leu Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser 
40 45 50 

ACC TGC CAG GGG CTG CA GTGAGTATGG GGAGGGGCCG GGCAGCTGGG 
Thr Cys Gin Gly Leu His 



AGAAGCCTCT GGCCCAGGCC TGGGGACGGA GGGGAGCTGC GCCTCTCTCT CCACAG T 1081 
GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA GAG 1129 
Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr He Val Glu Pro Gin Glu 
60 65 70 

GTG GCT GGA CCT TGG GGA GCC CCT CAG GTAAGCCCCA CACAACCCCT H76 
Val Ala Gly Pro Trp Gly Ala Pro Gin 
75 80 



TGCCATCCTC TCTGGTGGCC CTGCCAAGCT 
CTCCTCCGGC TCCTCCCTCA GTAACCCCAG 
TGGTTCCCTC CCTCCTGTGC CCCAGCTCCC 
TCCCATAAGT GACCTCCCAT TGGGCTCCAA 
CCAGGTCTTG ACCCCGGAAT CTGAGCATCT 
CCAGTTCTGG GTCACCCCAG GGTGGGGTGG 



TGTCCCAACA GCTGTTGCTG CCACCTCTTC 1236 
CCTCACTGCC CTCTTCAGTG ACCCCAGCTC 1396 
CCTGTGCCCC CAGCTCCAAT GTCCCATCTG 1356 
TGTCCTTTGC CCCTGTCTCT CAGGGTGCCC 1416 
GGGAGATCAG ATCCGACATG GGAGCTGTGG 1476 
AGGCGAGGGC TGGATCTGGC CCCCGCCAAG 1536 
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TGGCCTGGAG CAGGCCCAGT TGGCACCCCA AGAACTAATT TCCCCTCATT GCAG GGA 1593 

Gly 

CCC CTT CCC CAC CTC ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC 1641 
Pro Leu Pro His Leu lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu 
85 90 95 

GGA TGC AGG GAA CCA G GTAAGGGAGG GGAGGGGGGG TGGGGAGGGG CCNGGCTGTG 1697 
Gly Cys Arg Glu Pro Gly 
100 

CCCCCCTCAC CTGCCCCTCC CCGACAG GC TGC CTG TTT GCT GTG CCT GCC CAG 1750 

Cys Leu Phe Ala Val Pro Ala Gin 



TCG GCT CCT CCA AAC CGG CCG AGG CTG AGA AGG AAA AGG CAG 1792 
Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys Arg Gin 
115 120 125 

GTACGGGGGC CCGCACAGAC CTCGGGCTGC AGAGACCTCG GGCTGCAGAG AGACCTCGGC 1852 

CGTGGCCCAG AGCAGGAGGG CACCCTCATC TATGGCTGGG GCGAAGGAAG GCTCAGATGG 1912 

ATGTGGCTGG GGGCCAGGGA CCGTGTCTGG GAGAAGCCCC CACCCCTTCC CTAATGCTGG 1972 

CATCTACAGA GGCCCCATCC TGGGCAAACC GAGGCTGCCT GCCCTCATTC CAAAGCTGAG 2032 

GAAGGACAGG ACCCTCTGCC AGTGGGGAGC TGGCACTGTC CCTGGCTGGA GTCCAGACCC 2092 

CCCCATCCCC ACCGAGTCTG TTCCTGGCTT GGCCATGAGA TCAGTCAGAC ATGGAAGGGA 2152 

CTGATTCCAA GTGCCCACCC ACCCCCCAG GTC CGC CGG GGC CAC CCT ACA GTG 2205 

Val Arg Arg Gly His Pro Thr Val 
130 135 

CAC AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG 2253 
His Ser Glu Thr Lys Tyr Val Glu Leu He Val He Asn Asp His Gin 
140 145 150 

CTG GTGAGTGCCA GGGCAGGGAC AGGGCGTGAC ACTGGGAGGC CCCTGAGGAG 2306 
Leu 

CCTGGCCCTC CTCCCATTCT TCTCTCTCCC AG TTC GAG CAG ATG CGA CAG TCG 2359 

Phe Glu Gin Net Arg Gin Ser 
155 
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GTG GTC CTC ACC AGC AAC TTT GCC AAG TCC GTG GTG AAC CTG GCC GAT 2407 
Val Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu Ala Asp 
160 165 170 175 

GTG GTAAGCAGCT CTCCCTCCCT CCCTTCCCTC CTCCTCATGC CCCCCCACCC . 2460 
Val 

CACCACACAC ATTAGGGGGC ACTGTCAGCC CCTGGCTCCC ACTTCCTGGA GAGAACAGAC 2520 
AGGCCCTCCT CCAGCCCTGG CCCCAACACC CACTCCCACC CTCCAGCCCC CCTCATCTTC 2580 

TCCCCAG ATA TAC AAG GAG CAG CTC AAC ACT CGC ATC GTC CTG GTT GCC 2629 
He Tyr Lys Glu Gin Leu Asn Thr Arg: He Val Leu Val Ala 
180 185 190 

ATG GAA AC A TGG GCA GAT GGG GAC AAG ATC CAG GTG CAG GAT GAC CTC 2677 
Met Glu Thr Trp Ala Asp Gly Asp Lys He Gin Val Gin Asp Asp Leu 
195 200 205 

CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC CGA CGG GAG GGT CTG CCT 2725 
Leu Glu Thr Leu Ala Arg: Leu Met Val Tyr Arg Arg Glu Gly Leu Pro 
210 215 220 

GAG CCC AGT A AT GCC ACC CAC CTC TTC TC GTGAGTCCCC CACCCTGCAC 2774 
Glu Pro Ser Asn Ala Thr His Leu Phe Ser 
225 230 

CTCCTGCCAG CCTCTGCTAG TTGCTACAGT GCTTGGGATT ACTTAACACC TGCCCTGTGC 2834 
TGGCTGCTCC TCTCAGAGTC TGGGGACTGG GCTCACCTTG CACCTGCCAC CTACCCCCAG 2894 
CCACATGCAA CAGCTGGGCA TCATCCCCTG AATCTGAGGT TGATGCCCTT GTCTTAGCCC 2954 
TGGTGGTCCT CTTCTGCCTC TCACCTCCCC TTAGTTCTGT CTTTCCCTTC AACTGTCCCN 3014 
NNNNNNNNNN NAGAGTGAAA CTCTGTCTCA AAAGAAAAAN AAAANAAAAG AAGAAAAAAA 3074 
AGAACCCAAG GAGCGGGGGA AGGGTCTTGC CTGGGGTCAC CAAGGCTGAT GTAAAGGGCC 3134 
AGGCTCACCT CCTGAGGAAG GACTCTAGTG TGAGGGGCTC CCCAAGGCCC CACCACCACC 3194 

CGGGGAGCCA CAGGGGAGGG CAGAAGCCAT CCTGACAGCG CACTCCCTTC CAG G GGC 3251 

Gly 

AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC ATA 3299 
Arg- Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly He 
235 240 245 
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TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG GTGAGCAGTG 3342 
Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu 
250 255 260 

GGGGGACATG GCTGGGGTGG CGGCTGAGGG AAAGGGGCTT AGGGGCACGA CGTGCCTGNT 3402 
TGGAAGATGT AGACATCTGT GCCCCATCTT CCCCACCCCC AG TAC GGC AAC ATG 3456 

Tyr Gly Asn Met 

GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC 3504 
Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly 
265 270 275 280 

ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA G GTATCCTCCC CCAGAGGCCC 3555 
•Met Met Trp Asn Lys His Arg Ser Ser Ala Gly 
285 290 

CCGTGTGGCC CAGCAGCTCT GGAACGGGAG GGTGACAGTG GGAGGGGTGG TCCTTGGCCT 3615 

CCCTCATATC CGCCTGGCTC ACCCCTCAG GG GAC TGC AAG TGT CCA GAC ATC 3667 

Asp Cys Lys Cys Pro Asp lie 
295 

TGG CTG GGC TGC ATC ATG GAG GAC ACT GG GTGAGTTCTT GGGGACAACC 3716 
Trp Leu Gly Cys He Met Glu Asp Thr Gly 
300 305 

GGGGGAAGGT CTTGGGCGAG GGGAGTCTTA GAGCGAGCAT 

GGGNNNNNNN NNNNNGAACA CACCTTCCCT TCCAGGCCGG 

GCGAGGGATG GGAGCGACAA GGGACAAGGC GGAGGATTCT 

CCTCCGCCTC CTCGCGATGG TGACGAAGTC CCCCAGTGTA 

GGGGTGAGGG TGGGTTGGAG GGGAGCAGCC AGCAGCACCT 

TTC TAC CTG CCC CGC AAG TTC TCT CGC TGC AGC 
Phe Tyr Leu Pro Arg Lys Phe Ser Arg- Cys Ser 
310 ' 315 

CAG TTT CTG CAG GAG GGT GGT GGC AGC TGC CTC 
Gin Phe Leu Gin Glu Gly Gly Gly Ser Cys Leu 
325 330 335 



TGTTTGGCAG TCTGGACCAG 3776 
CTTGCGAGTC CCAGGTTCAA 3836 
GGTGCAATCC CGGGGCAGAT 3896 
CCCCCTCCCC AGCCTTGAGA 3956 
CCCCTCGCCC TATCCAG G 4014 

ATC GAC GAG TAC AAC 4062 

He Asp Glu Tyr Asn 

320 

TTC AAC AAG CCC CTC 4110 
Phe Asn Lys Pro Leu 
340 
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AAG GTACCAGCCC CGCGGCGGGG AGCATGGGAG CGGGCCCTGG GCGGGGTCCG 4163 
Lys 

GGCCAGACTC CCGACCTGTC CTCCCGGTCC AG CTC CTG GAC CCC CCA GAG TGC 4216 

Leu Leu Asp Pro Pro Glu Cys 
345 

GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC TGC GGC TCG GTG 4264 
Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val 
350 355 360 

CAG GTGAGCGGTG GTGCGGGCGC CAGGTGGGGA ACCGGGATGC GGGGGTGGGC N 4317 

Gin 

365 

ACCAGGGAGC GTCTGAGTGG GAGGATTAGG GCTCGCCCGC CTCCTTCCCC TCCTCCCGCG 4377 

TCCCTCAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG AAA TGC ACC 4427 
Glu Cys Ser Are Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr 
370 375 

CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT CGC CGC TGC 4475 
Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys 
380 385 390 395 

AAG GTAAGCAGGA CCGGCCGGGA GGCGGGGCCA GGACGCAGGA GGAGCGATTG 4528 
Lys 



GAGGCCTTCA TATAAGGGGT GGGAGCTAGG 
CTCTGGGGCA GGGCTTGATG CGAAGACAGC 
GTTGAAGGCN NNNNNNNNNN NNNCGGACGG 
GTGGGCGGGC TTGGGGCGGT GCTGAGTGCG 
TGGTGGGAGC AGGGAAATAA GAACAGGCCT 
GGGATGTGGG GGTCCAGAGA GCGGGGGGCC 

GCCTGACTCG AGGAGCGCGT CTCTTCCCTA 



GAGGGAAGCG GAGCCTTCGG GGACGAAGGC 4588 

GCCAATGGGA GCAAGGGCGG GCTGAAGGAT 4648 

GAAGCTCCCA GAATCAAGGA GGGCGGGAAG 4708 

CTGGGAGCGA GGTGGGGAGC GTTCAAGAGG 4768 

AAACGGGGCC CTGGGGAGCT GGAGGGCCCG 4828 

TGGGGAGGGC AGGGCCGAGG CATCCATCCT 4888 

G TAC GAA CCA CGG GGT GTG TCC 4940 

Tyr Glu Pro Arg Gly Val Ser 
400 
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TGC CGA GAG GCC GTG AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG 4988 
Cys Arg Glu Ala Val Asn Glu Cys Asp He Ala Glu Thr Cys Thr Gly 
405 410 415 

GAC TCT AGC CAG GTCCGCCCGG CCCCGCCGTC TTGTGGAGCC CTGGGCGAGG 5040 

Asp Ser Ser Gin 

420 

CAACCCCTAC CCTTGTCGAT TTGGTTTTCC CGGACGAGTG CTCAGCACTC CCCTCCTCTC 5100 
CACAGCTGGC ATCGACCTTC ACTGATCAGA CTGTTTTCTT ATCTGAGAAA GGGGTTCTTC 5160 
ATGCTCCTGG CCTTGTTCCT TCAATCATTA AACCAGAATG TATCGTCTGG CTGGTATCCC 5220 
AGCGCCTGGG CCCGGTGNNN NNNNNNNNTA CCCAGATTCC TCCTGGGCAG CCCTCAGCTC 5280 
CAGTCCTGGG riarrrrrAa rrrAarrrTr. cr.&rTn.r-rrr r-rxr* irrrr irr^fTrTPT -„ 

CCACAG TGC CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC 5388 
Cys Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp 
425 430 435 

CAT GAG CAG GTATGATGGC TGCCCCCTGA GCCTGGGATT CAGGGCAGTC 5437 
His Glu Gin 
440 

TCTTATCTCC ACTCTGACCA CTCAGCATCT CCATCCCTTG CCTCTTAATT CTTGGACTCT 5497 

CAG GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC 5545 
Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys 
445 450 455 

CAG GTT CTT TGG GGC CAT G GTGAGTCTGC TAGGGCTGGA GTGGGACTCC 5594 
Gin Val Leu Trp Gly His Ala 
460 

GGAGGAGCCC AGAGCTGAGA AGCTGGGGAG AGTGGGTTCC AGCTGAACAG GCCCCCAAGT 5654 
GTGTAGCTCC CCAGGATCTC AGGGAGCCCA GGCAGAGTGT GGGAGATGCA GGCCTGAGGT 5714 

CTTGGGGTGG GTCCTGGGGC ACGTGGGGTC ACTTGGCATC CTCTCCCCAC AG CG GCT 5771 

Ala 

GCT GAT CGC TTC TGC TAC GAG AAG CTG AAT GTG GAG GGG ACG GAG CGT 5819 
Ala Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr Glu Arg 
465 470 475 
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GGG AGC TGT GGG CGC AAG GGA TCC GGC TGG GTC CAG TGC AGT AAG CA 5866 
Gly Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser Lys Gin 
480 485 490 495 

GTGAGTACTG AGGCTCCCAG AGGGCCTCTC AGCTCCAGGG CAGGTGTGAG ACTTTTCAGA 5926 
GATGGGGTAG TAGGTTCTCC CAGGAGGAGC CTGTCAGTCC CAATGGGCGG GCACGTGGCA 5986 
AATGAGGTGG CAGGGTGCAG GGTGAGGGCA GATTAGAGTT CAGTAGTTGA GTCTGAGGTC 6046 

AAACTTGGGG CTCACTGTCT CTATAT G CCC CM CAG GGA CGT GCT GTG TGG 6097 

Pro Gin Gin Gly Arg Ala Val Trp 
500 

CTT CCT CCT CTG TGT CAA CAT CTC TGG AGC TCC TCG GCT AGG GGA CCT 6145 
Leu Pro Pro Leu Cys Gin His Leu Trp Ser Ser Ser Ala Arg Gly Pro 
505 510 515 

GGT GGG AGA CAT CAG TAGTGTCACC TTCTACCACC AGGGCAAGGA GCTGGACTGC 6200 

Gly Gly Arg: His Gin 

520 

AGGTGCTGAC CAGCACCAAA ACTCAGGGAG GGGACCTGGC AGCTGTGCTG GGGGTTAGAA 6260 
GATCTGGGGG CTGGAGGCTG GGCTGTGTCA CTTCCCCAGG GGAGGCCACG TGCAGCTGGC 6320 
GGACGGCTCT GACCTGAGCT ATGTGGAGGA TGGCACAGCC TGCGGGCCTA ACATGTTGTG 6380 
CCTGGACCAT CGCTGCCTGC CAGCTTCTGC CTTCAACTTC AGCACCTGCC CCGGCAGTGG 6440 
GGAGCGCCGG ATTTGCTCCC ACCACGGGGT GACTGCCTGG AGCCCGGGAT GGCGGGAGAA 6500 
GCTTACAAGA GGGGACAGGC CCCTGCTCAC CTCTCCTGGC CCTGCCCTGC CTCTAGGTCT 6560 
GCAGCAATGA AGGGAAGTGC ATCTGTCAGC CAGACTGGAC AGGCAAAGAC TGCAGTATCC 6620 
ATAACCCCCT GCCCACGTCC CCACCCACGG GGGAGACGGA GAGATATAAA GGTGAGGCTG 6680 
GAGCTGGCCG AGGGGGGTCT GTCTGTCCCG CTCTCTATGC CTGTCCTTGC CAGCTAAGCC 6740 
CTGCCATCCT CCCAGGTCCC AGCGGCACCA ACATCATCAT TGGCTCCATC GCTGGGGCTG 6800 
TCCTGGTTGC AGCCATCGTC CTGGGCGGCA CGGGCTGGGG ATTTAAGTAA GAGACACACA 6860 
CACCCTGTGC CCCCTGGCAT CCTTGAGGGG GGATCAGAAT CCCTACTGGT GGAGCTGAGG 6920 
GGGCCCTCCC TGAAAGCCCA ACTGAACCAG AGCTCACACG TCATAGGTCC AAGTAGCCTG 6980 
CAGGGCTTAA CATTTAGAAA CTAGGAGATT TTAGGCTAGA TGAGGTGCTC ACGCCTGTAA 7040 
TCCCAGCACT TTGGGAGGCC AAGGCAGGCG GATCACCTGA GGTCAGGAAT TCAAGACCAG 7100 
TCTGGCCAAC ATGGTGAAAC CCGTCTCTAT TAAAAATACA AAAATTAGCC AGCCATGGTG 7160 
GTGCACACCT GTAATCCCAG CTACTTGCGA GGCTGAGGCA GAGAATTGCT TGAACCCGGG 7220 
AGGTGGAGGT TGCAGTGAGC TGAGATCGCA CCATTGCACT CCAGCCTTGG GTGACAGAGC 7280 
AAGACTGCGT CAAAAAAAAA AAAAAAAAAA AAAAAAAGGA AAGAAAGAGA GAAAGAAAAG 7340 
AAAAGAGAAA AGAAATCAGG AGATTTTACA CTAGCAATTC GGATTTCCAG CTCTGGAAAC 7400 
ATGAAAAGGT TGAGCCCCAG CGTGCCTCTA AGCATCCCCA AATAGCCACA GAGTGGAGCT 7460 
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GGGCAGGGGC CACCCAAGCC AGGCATGTGT CCTCCAGTCT CCAGTTCCCA CCAGCCTATA 7520 
CTCCTTTGTG CGTGTCTAAG TTTGGGGTCC TTGTGCCTGG TCTTACCCCC CTTAATGTGC 7580 
AGAGGGAGGA ACCCACGGCC CAAGGTCACA TGATTGAGTT AGTAGCAGAG TCAGAGCTGG 7640 
AACCGGGACG CATTTTTGTG GGTGCCCTGG GTAATTCTCC CTGGCCCTTA CATTAGTGTC 7700 
CAGGCCCCGG GGACCCCGGC CCCGCTCTGG GGCAAGGGGT CGCATGGCAG CCAAAGGCCC 7760 
CTCCCTGAGA GAAGCAAAAG GTCAGATGTC TCCTTTTCCT CTCCCCTTCC ACCATCCTCC 7820 
CCCTGCAGAA ACATTCGCCG AGGAAGGTAC GACCCGACCC AGCTGGGGGC AGTGTGATGC 7880 
CGGCCACGTC ATCCCTCCCG CTGTCCTTGT CTCCTCCATC TCATTCGTCA CCCGCGTTCT 7940 
GTTGATGGGG TGCGGGGCCG ATCCCACCCT GCGTGCCNNN NNNNNNNNNN ATCTGTTTTG 8000 
TCTTCCATAT CACCACTGTC TGACCTCCCG CAGATCCCTT CCCTGGCCAG CCTGTGACTT 8060 
GCCGCCTGCC TCCAGGGCCC AGAACTGAGC TCCGGGGCCC TGCTGGGGGG CTCTCCCCGA 8120 
GGCCCCTGCT CACGTCCTCC CCTGATGCCC CCTCTCCGTT CCAGGTCCGG AGGGGCCTAA 8180 
GTGCCACCCT CCTCCCTCCA AGCCTGGCAC CCACCGTCTC GGCCCTGAAC CACGAGGCTG 8240 
CCCCCATCCA GCCACGGAGG GAGGCACCAT GCAAATGTCT TCCAGGTCCA AACCCTTCAA 8300 
CTCCTGGCTC CGCAGGGGTT TGGGTGGGGG CTGTGGCCCT GCCCTTGGCA CCACCAGGGT 8360 
GGACCAGGCC TGGAGGGCAC TTCCTCCACA GTCCCCCACC CACCTCCTGC GGCTCAGCCT 8420 
TGCACACCCA CTGCCCCGTG TGAATGTAGC TTCCACCTCA TGGATTGCCA CAGCTCAACT 8480 
CGGGGGCACC TGGAGGGATG CCCCCAGGCA GCCACCAGTG GACCTAGCCT GGATGGCCCC 8540 
TCCTTGCAAC CAGGCAGCTG AGACCAGGGT CTTATCTCTC TGGGACCTAG GGGGACGGGG 8600 
CTGACATCTA CATTTTTTAA AACTGAATCT TAATCGATGA ATGTAAACTC GGGGGTGCTG 8660 
GGGCCAGGGC AGATGTGGGG ATGTTTTGAC ATTTACAGGA GGCCCCGGAG AAACTGAGGT 8720 
ATGGCCATGC CCTAGACCCT CCCCAAGGAT GACCACACCC GAAGTCCTGT CACTGAGCAC 8780 
AGTCAGGGGC TGGGCATCCC AGCTTGCCCC CGCTTAGCCC CGCTGAGCTT GGAGGAAGTA 8840 
TGAGTGCTGA TTCAAACCAA AGCTGCCTGT GCCATGCCCA AGGCCTAGGT TATGGGTACG 8900 
GCAACCACAT GTCCCAGATC GTCTCCAATT CGAAAACAAC CGTCCTGCTG TCCCTGTCAG 8960 
GACACATGGA TTTTGGCAGG GCGGGGGGGG GTTCTAGAAA ATATAGGTTC CTATAATAAA 9020 
ATGGCACCTT CCCCCTTTNN NNNNNNNNNN NNNGGGATAC CTCTGAATAT GGGTATCTGG 9080 
GGCTGGATAT GGGTGGGACA TGAGACTTCC TGTGACCAGC CACCCTGGCT CCCAGCTCTC 9140 
TGTATCCTCC TGCCCCGCCC TGGGGGGTGC CTACCCTGGN AGAACCCAGG GAGGAGTGGA 9200 
GGCTGCCTCT GCCTGGGCCT CCACACAGCA TCCTGACATA CGCCACCTGG GGTGGGGGTG 9260 
GGGAGGCAGG GCCAGGAG 9278 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GCACCTGCCC CGGCAGT 
I) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CCAGGACAGC CCCAGCGATG 
) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

GGCTGCTGAT CGCTTCTGCT AC 
INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
>° GAGAAGCTGA ATGTGGAGGG 20 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS 

75 

(A) LENGTH: 19 base pairs 
(b; lire, iiuticic aci'd 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GTCAGAGCCG TCCGCCAGC 19 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GCCATCCTCC ACATAGCTCA GG 22 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

GATGTAAGTC AAGTTCCCAT CAGAGA 26 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH:' 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

AACAGCTGGT GGTCGTTGAT CACAA 25 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 base pairs 
45 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
ATGAGGCTGC TGCGGCGCTG 20 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS 

10 (A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
fS (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CACAGATCTG GGGGCATATG CTCCCTG 27 
25 (2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 27 base pairs 

30 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

40 

AACAAGCTTC TACTGATGTC TCCCACC 27 

45 
50 
55 
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SEQUENCE LISTING 

Reference: 61 826 / u6 
Date: May 13, 1994 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: (1) CANCER INSTITUTE 

<2) EISAI CO. , LTD. 

(B) STREET: (l) 37-1, Kamiikebururo l-chome 

Toshima-ku 
(2) 6-10, Koishikawa 4-chome, Bunkyo-ku 

(C) CITY: Tokyo 

(D) STATE: 

(E) COUNTRY: Japan 

(ii) TITLE OF INVENTION: MDC PROTEINS AND DNAs ENCODING 

THE SAME 



30 (iii) NUMBER OF SEQUENCES: 20 



(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.25 (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 488 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULAR TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
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Leu Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr 

15 10 15 

Thr Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys 

20 25 30 

Leu Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly 

35 40 45 

Leu His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr lie Val Glu Pro 

50 55 60 

Gin Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His 
75 65 70 75 80 

Leu lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu 

85 90 95 

Pro Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg 

100 105 110 

Pro Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val 
115 120 125 

25 His Ser Glu Thr Lys Tyr Val Glu Leu lie Val lie Asn Asp His Gin 

130 135 140 

Leu Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala 
145 150 155 160 

30 

Lys Ser Val Val Asn Leu Ala Asp Val lie Tyr Lys Glu Gin Leu Asn 

165 170 175 

Thr Arg lie Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys 
35 180 185 190 

lie Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val 

195 200 205 

Tyr Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe 

40 

210 215 220 

Ser Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly 
225 230 235 240 

45 Gly lie Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn 

245 250 255 

Met Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu 

260 265 270 

Gly Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys 
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275 280 285 

Pro Asp He Trp Leu Gly Cys He Met Glu Asp Thr Gly Phe Tyr Leu 
290 295 300 

5 

Pro Arg Lys Phe Ser Arg Cys Ser He Asp Glu Tyr Asn Gin Phe Leu 
305 310 315 320 

Gin Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu 
70 3 2 5 3 3 0 3 3 5 

Asp Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys 

340 345 350 

Asp Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys 

75 

355 360 365 

Lys Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys 
370 375 380 

20 cys Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala 

385 390 395 400 

Val Asn Glu Cys Asp He Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin 
405 410 415 

25 

Cys Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu 

420 425 430 

Gin Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys 
30 435 440 445 

Gin Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys 

450 455 460 

Leu Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser 
465 470 475 480 

Gly Trp Val Gin Cys Ser Lys Gin 
485 488 
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(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 524 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULAR TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
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(ii) MOLECULAR TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

5 

Met Cys Trp Leu Ser His Gin Leu Leu Ser Ser Gin Tyr Val Glu Arg 

15 10 15 

His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly Ala Gly Asp 
10 20 25 30 

His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His Ser Phe Ala 

35 40 45 

Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser Asp Gly Asn 

50 55 60 

Leu Thr Tyr He Val Glu Pro Gin Glu Val Ala Gly Pro Trp Gly Ala 
65 70 75 80 

Pro Gin Gly Pro Leu Pro His Leu lie Tyr Arg Thr Pro Leu Leu Pro 

85 90 95 

Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala Val Pro Ala 

100 105 110 

Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys Arg Gin Val 

115 120 125 

Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr Val Glu Leu 

130 135 140 

He Val He Asn Asp His Gin Leu Phe Glu Gin Met Arg Gin Ser Val 
145 150 155 160 

Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu Ala Asp Val 

165 170 175 

He Tyr Lys Glu Gin Leu Asn Thr Arg He Val Leu Val Ala Met Glu 

180 185 190 

Thr Trp Ala Asp Gly Asp Lys He Gin Val Gin Asp Asp Leu Leu Glu 

195 200 205 

Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu Pro Glu Pro 

210 215 220 

Ser Asn Ala Thr His Leu Phe Ser Gly Arg Thr Phe Gin Ser Thr Ser 
225 230 235 240 

Ser Gly Ala Ala Tyr Val Gly Gly He Cys Ser Leu Ser His Gly Gly 

24S 250 255 

Gly Val Asn Glu Tyr Gly Asn Met Gly Ala Met. Ala Val Thr Leu Ala 
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260 265 270 

Gin Thr Leu Gly Gin Asn Leu Gly Met Met Trp Asn Lys His Arg Ser 

275 280 285 

Ser Ala Gly Asp Cys Lys Cys Pro Asp He Trp Leu Gly Cys He Met 

290 295 300 

Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg Cys Ser He 
305 310 315 3 20 

Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser Cys Leu Phe 

325 330 335 

Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly Asn Gly Phe 

340 345 350 

Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin Glu Cys Ser 

355 360 365 

Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr His Asp Ala 

370 375 380 

Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr Glu Pro Arg 
25 385 390 395 400 

Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp He Ala Glu Thr 

4 °5 410 415 

Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His Lys Leu Asp 

420 425 430 

Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly Gly Arg Cys 

435 440 445 

Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His Ala Ala Ala 

450 455 460 

Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr Glu Arg Gly 
465 470 475 48O 

Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser Lys Gin Asp 

485 490 495 

Val Leu Cys Gly Phe Leu Leu Cys Val Asn He Ser Gly Ala Pro Arg 

500 505 510 

Leu Gly Asp Leu Val Gly Asp He Ser Ser Val Thr Phe Tyr His Gin 

515 " 520 525 

Gly Lys Glu Leu Asp Cys Arg Gly Gly His Val Gin Leu Ala Asp Gly 
50 530 535 54Q 

Ser Asp Leu Ser Tyr Val Glu Asp Gly Thr Ala Cys Gly Pro Asn Met 
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Met Cys Trp Leu Ser His Gin Leu Leu Ser Ser Gin Tyr Val Glu Arg 

15 10 15 

His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly Ala Gly Asp 

20 25 30 

His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His Ser Phe Ala 

35 40 45 

Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser Asp Gly Asn 

50 55 60 

Leu Thr Tyr He Val Glu Pro Gin Glu Val Ala Gly Pro Trp Gly Ala 
65 70 75 80 

Pro Gin Gly Pro Leu Pro His Leu He Tyr Arg Thr Pro Leu Leu Pro 

85 90 95 

Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala Val Pro Ala 

100 105 11.0 

Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys Arg Gin Val 

115 120 125 . 

Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr Val Glu Leu 

130 135 140 

He Val He Asn Asp His Gin Leu Phe Glu Gin Met Arg Gin Ser Val 
145 150 155 160 

30 Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu Ala Asp Val 

165 170 175 

He Tyr Lys Glu Gin Leu Asn Thr Arg He Val Leu Val Ala Met Glu 

180 185 190 

Thr Trp Ala Asp Gly Asp Lys He Gin Val Gin Asp Asp Leu Leu Glu 

195 200 205 

Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu Pro Glu Pro 

210 215 220 

Ser Asn Ala Thr His Leu Phe Ser Gly Arg Thr Phe Gin Ser Thr Ser 
225 230 235 240 

Ser Gly Ala Ala Tyr Val Gly Gly He Cys Ser Leu Ser His Gly Gly 
45 245 250 255 

Gly Val Asn Glu Tyr Gly Asn Met Gly Ala Met Ala Val Thr Leu Ala 

260 265 270 

Gin Thr Leu Gly Gin Asn Leu Gly Met Met Trp Asn Lys His Arg Ser 
275 280 285 
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Ser Ala Gly Asp Cys Lys Cys Pro Asp lie Trp Leu Gly Cys lie Met 

290 295 300 

Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg Cys Ser lie 
305 310 315 320 

Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser Cys Leu Phe 

325 330 335 

Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly Asn Gly Phe 

340 345 350 

Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin Glu Cys Ser 

355 360 365 

Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr His Asp Ala 

370 375 380 

Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr Glu Pro Arg 
20 3 8 5 3 9 0 3 9 5 4 0 0 

Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp lie Ala Glu Thr 

405 410 415 

Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His Lys Leu Asp 

420 425 430 

Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly Gly Arg Cys 

435 440 445 

Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His Ala Ala Ala 

450 455 460 

Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr Glu Arg Gly 
465 470 475 480 

Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser Lys Gin Pro 

485 490 495 

Gin Gin Gly Arg Ala Val Trp Leu Pro Pro Leu Cys Gin His Leu Trp 

500 505 510 

Ser Ser Ser Ala Arg Gly Pro Gly Gly Arg His Gin 
515 520 524 

45 (2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: . 

(A) LENGTH: 6 70 amino acids 

(B) TYPE: amino acid 

50 

(D) TOPOLOGY: linear 
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545 550 555 560 

Leu Cys Leu Asp His Arg Cys Leu Pro Ala Ser Ala Phe Asn Phe Ser 

565 570 575 

Thr Cys Pro Gly Ser Gly Glu Arg Arg lie Cys Ser His His Gly Val 

580 585 590 

Cys Ser Asn Glu Gly Lys Cys lie Cys Gin Pro Asp Trp Thr Gly Lys 

595 600 605 

Asp Cys Ser lie His Asn Pro Leu Pro Thr Ser Pro Pro Thr Gly Glu 

610 615 620 

Thr Glu Arg Tyr Lys Gly Pro Ser Gly Thr Asn lie lie lie Gly Ser 
625 630 635 640 

lie Ala Gly Ala Val Leu Val Ala Ala He Val Leu Gly Gly Thr Gly 

645 6S0 655 

Trp Gly Phe Lys Asn He Arg Arg Gly Arg Ser Gly Gly Ala 
660 665 670 



(2) INFORMATION FOR SEQ ID NO : 4: 

25 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 76 9 amino acids 

(B) TYPE: amino acid 
30 (D) TOPOLOGY: linear 

(ii) MOLECULAR TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

35 

Met Arg Leu Leu Arg Arg Trp Ala Phe Ala Ala Leu Leu Leu Ser Leu 
1 5 10 15 

Leu Pro Thr Pro Gly Leu Gly Thr Gin Gly Pro Ala Gly Ala Leu Arg 

40 20 25 30 

Trp Gly Gly Leu Pro Gin Leu Gly Gly Pro Gly Ala Pro Glu Val Thr 

35 40 45 

Glu Pro Ser Arg Leu Val Arg Glu Ser Ser Gly Gly Glu Val Arg Lys 

45 50 55 60 

Gin Gin Leu Asp Thr Arg Val Arg Gin Glu Pro Pro Gly Gly Pro Pro 

65 70 75 80 

Val His Leu Ala Gin Val Ser Phe Val lie Pro Ala Phe Asn Ser Asn 

85 90 95 
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Phe Thr Leu Asp 
100 

Val Glu Arg His 
115 

Ala Gly Asp His 
130 

Ser Phe Ala Ala 
145 

Asp Gly Asn Leu 

Trp Gly Ala Pro 
180 

Leu Leu Pro Asp 
195 

Val Pro Ala Gin 
210 

Arg Gin Val Arg 

225 

Val Glu Leu He 

Gin Ser Val Val 
260 

Ala Asp Val He 
275 

Ala Met Glu Thr 
290 

Leu Leu Glu Thr 

305 

Pro Glu Pro Ser 

Ser Thr Ser Ser 
340 

His Gly Gly Gly 
355 

Thr Leu Ala Gin 
370 



Leu Glu Leu Asn 

Phe Ser Arg Glu 
120 

Cys Tyr Tyr Gin 
135 

Leu Ser Thr Cys 
150 

Thr Tyr He Val 
165 

Gin Gly Pro Leu 

Pro Leu Gly Cys 
200 

Ser Ala Pro Pro 
215 

Arg Gly His Pro 
230 

Val lie Asn Asp 
245 

Leu Thr Ser Asn 

Tyr Lys Glu Gin 
280 

Trp Ala Asp Gly 
295 

Leu Ala Arg Leu 
310 

Asn Ala Thr His 
32 5 

Gly Ala Ala Tyr 

Val Asn Glu Tyr 
360 

Thr Leu Gly Gin 
375 



His His Leu Leu 
105 

Gly Thr Thr Gin 

Gly Lys Leu Arg 
140 

Gin Gly Leu His 
155 

Glu Pro Gin Glu 
170 

Pro His Leu He 
185 

Arg Glu Pro Gly 

Asn Arg Pro Arg 
220 

Thr Val His Ser 
235 

His Gin Leu Phe 
250 

Phe Ala Lys Ser 
265 

Leu Asn Thr Arg 

Asp Lys He Gin 
300 

Met Val Tyr Arg 
315 

Leu Phe Ser Gly 
330 

Val Gly Gly He 
345 

Gly Asn Met Gly 

Asn Leu Gly Met 
380 



Ser Ser Gin Tyr 
110 

His Ser Thr Gly 
125 

Gly Asn Pro His 

Gly Val Phe Ser 
160 

Val Ala Gly Pro 
175 

Tyr Arg Thr Pro 
190 

Cys Leu Phe Ala 
205 

Leu Arg Arg Lys 

Glu Thr Lys Tyr 
240 

Glu Gin Met Arg 
255 

Val Val Asn Leu 
270 

He Val Leu Val 
285 

Val Gin Asp Asp 

Arg Glu Gly Leu 
320 

Arg Thr Phe Gin 
335 

Cys Ser Leu Ser 
350 

Ala Met Ala Val 
365 

Met Trp Asn Lys 
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35 40 45 

CTG CAT GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC 192 
5 Leu His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr lie Val Glu Pro 

50 55 60 

CAA GAG GTG GCT GGA CCT TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC 24 0 

Gin Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His 
10 65 70 75 80 

CTC ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA 288 
Leu He Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu 
85 90 95 

: :CCA GGC TGC CTG TTT GCT GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG 336 
Pro Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg 

100 105 110 

CCG AGG GTG AGA AGG AAA AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG 384 
Pro Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val 

115 120 125 

CAC AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG 4 32 

His Ser Glu Thr Lys Tyr Val Glu Leu He Val He Asn Asp His Gin 

130 135 140 

CTG TTC GAG CAG ATG CGA CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC 4 80 

Leu Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala 
145 150 155 160 

AAG TCC GTG GTG AAC CTG GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC 52 8 

Lys Ser Val Val Asn Leu Ala Asp Val He Tyr Lys Glu Gin Leu Asn 

165 170 175 

ACT CGC ATC GTC CTG GTT GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG 576 
Thr Arg He Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys 

180 185 190 

ATC CAG GTG CAG GAT GAC CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC 6 24 

He Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val 
195 200 205 

45 TAC CGA CGG GAG GGT CTG CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC 6 72 

Tyr Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe 

210 215 220 

TCG GGC AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG 7 20 

Ser Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly 
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225 230 235 240 

GGC ATA TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC 768 
Gly He Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn 

245 250 255 

ATG GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG 816 
Met Gly Ala Met Ala Val Thr lieu Ala Gin Thr Leu Gly Gin Asn Leu 

260 265 270 

GGC ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT 864 
Gly Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys 
275 280 285 

15 CCA GAC ATC TGG CTG GGC TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG 912 

Pro Asp He Trp Leu Gly Cys He Met Glu Asp Thr Gly Phe Tyr Leu 

290 295 300 

CCC CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG 96 0 

Pro Arg Lys Phe Ser Arg Cys Ser He Asp Glu Tyr Asn Gin Phe Leu 
305 310 315 320 

CAG GAG GGT GGT GGC AGC TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG 1008 
Gin Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu 

325 330 335 

GAC CCC CCA GAG TGC GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC 1056 
Asp Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys 
30 340 345 350 

GAC TGC GGC TCG GTG CAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC 1104 
Asp Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys 
355 360 365 

35 

AAG AAA TGC ACC CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC 1152 
Lys Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys 
370 375 380 

40 TGT CGC CGC TGC AAG TAC GAA CCA CGG GGT GTG TCc TGC CGA GAG GCC 12 00 

Cys Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala 
385 390 395 400 

GTG AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG 124 8 

45 

Val Asn Glu Cys Asp He Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin 

405 410 415 

TGC CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG 12 96 
50 Cys Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu 



25 



55 



84 



BNSDOCID: <EP 0633268A2> 



EP 0 633 268 A2 



70 



420 425 430 

CAG GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC 1344 
Gin Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys 

435 440 445 

CAG GTT CTT TGG GGC CAT GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG 13 92 
Gin Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys 

450 455 460 

CTG AAT GTG GAG GGG ACG GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC 144 0 
Leu Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser 
15 465 470 475 480 

GGC TGG GTC CAG TGC AGT AAG CAG 1464 
Gly Trp Val Gin Cys Ser Lys Gin 
485 

20 

(2) INFORMATION FOR SEQ ID NO: 6: 
{ i ) SEQUENCE CHARACTERISTICS : 
25 (A) LENGTH: 2923 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDNESS: double 

(D) TOPOLOGY: linear 

30 

<ii) MOLECULAR TYPE: cDNA to mRNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
35 (vxi) IMMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain cDNA library 
(ix) FEATURE: 

(A) NAME /KEY : 5 ' UTR 

40 

(B) LOCATION: 1. .27 
(ix) FEATURE: 

(A) NAME / KEY : CDS 
4 5 (B) LOCATION: 28.. 1599 

(ix) FEATURE: 

(A) NAME /KEY : 3 ' UTR 

(B) LOCATION: 1600.. 2923 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
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GCGTTTACTG GCAAACCGCA TTTGTAA ATG TGC TGG CTG AGC CAC CAA CTC 51 

Met Cys Trp Leu Ser His Gin Leu 
1 5 

CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA ACC 99 
Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr Thr 

10 15 20 

CAG CAC AGC ACC GGG GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG CTC 147 
Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu 
25 30 35 40 

CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG CTG 195 
Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu 

45 50 55 

CAT GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA 243 
His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr lie Val Glu Pro Gin 

60 65 70 

GAG GTG GCT GGA CCT TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC CTC 2 91 

Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu 

75 80 85 

ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA CCA 33 9 

lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro 
30 90 95 100 

GGC TGC CTG TTT GCT GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG CCG 387 
Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg Pro 
105 110 115 120 

35 

AGG CTG AGA AGG AAA AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG CAC 435 
Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val His 
125 130 135 

40 AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG CTG 4 83 

Ser Glu Thr Lys Tyr Val Glu Leu lie Val lie Asn Asp His Gin Leu 

140 14S 150 

TTC GAG CAG ATG CGA CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC AAG 531 
Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys 

155 160 165 

TCC GTG GTG AAC CTG GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC ACT 5 79 

Ser Val Val Asn Leu Ala Asp Val lie Tyr Lys Glu Gin Leu Asn Thr 
170 175 180 
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His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro Asp lie Trp Leu Gly 
385 390 395 400 

Cys lie Met Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg 

405 410 415 

Cys Ser lie Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser 

420 425 430 

Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly 

435 440 445 

Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin 
75 450 455 460 

GGlu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr 
465 470 475 480 

His Asp Ala Met Cys Ser Asp Giy Leu Cys Cys Arg Arg Cys Lys Tyr 

485 490 495 

Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp lie 

500 505 510 

Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His 

515 520 525 

Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly 

530 535 540 

Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His 
545 550 555 560 

Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr 

565 570 575 

Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser 

580 585 590 

Lys Gin Asp Val Leu Cys Gly Phe Leu Leu Cys Val Asn He Ser Gly 

595 600 605 

Ala Pro Arg Leu Gly Asp Leu Val Gly Asp He Ser Ser Val Thr Phe 

610 615 620 

Tyr His Gin Gly Lys Glu Leu Asp Cys Arg Gly Gly His Val Gin Leu 
625 630 635 640 

Ala Asp Gly Ser Asp Leu Ser Tyr Val Glu Asp Gly Thr Ala Cys Gly 
645 650 655 

50 Pro Asn Met Leu Cys Leu Asp His Arg Cys Leu Pro Ala Ser Ala Phe 

660 665 670 
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Asn Phe Ser Thr Cys Pro Gly Ser Gly Glu Arg Arg He Cys Ser His 

675 680 665 

His Gly Val Cys Ser Asn Glu Gly Lys Cys He Cys Gin Pro Asp Trp * 

690 695 700 

Thr Gly Lys Asp Cys Ser He His Asn Pro Leu Pro Thr Ser Pro Pro 
705 710 715 720 

Thr Gly Glu Thr Glu Arg Tyr Lys Gly Pro Ser Gly Thr Asn He He 

725 730 735 

He Gly Ser He Ala Gly Ala Val Leu Val Ala Ala He Val Leu Gly 

740 745 750 

Gly Thr Gly Trp Gly Phe Lys Asn He Arg Arg Gly Arg Ser Gly Gly 
755 760 765 

Ala 

20 76 9 

(2) INFORMATION FOR SEQ ID NO : 5: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULAR TYPE: cDNA to mRNA 
(xi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
35 (vii) IMMEDIATE SOURCE : 

(A) LIBRARY: human fetal brain cDNA library 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

40 

CTC CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA 4 8 

Leu Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr 
15 10 15 

45 ACC CAG CAC AGC ACC GGG GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG 96 

Thr Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys 

20 25 30 

CTC CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG 14 4 

50 

Leu Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly 
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CGC ATC GTC CTG GTT GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC 62 7 

Arg lie Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys lie 
185 190 195 200 

CAG GTG CAG GAT GAC CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC 6 75 

Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val Tyr 

205 210 215 

CGA CGG GAG GGT CTG CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC TCG 723 
Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser 

220 225 230 

GGC AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC 771 
Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly 

235 240 245 

ATA TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC ATG A19 
lie Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met 

250 255 260 

GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC 867 
Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly 
265 270 275 280 

ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT CCA 915 
Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro 
30 285 290 295 

GAC ATC TGG CTG GGC TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG CCC 96 3 

Asp lie Trp Leu Gly Cys lie Met Glu Asp Thr Gly Phe Tyr Leu Pro 
300 305 310 

35 

CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG CAG 1011 
Arg Lys Phe Ser Arg Cys Ser lie Asp Glu Tyr Asn Gin Phe Leu Gin 
315 320 325 

40 GAG GGT GGT GGC AGC TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG GAC 105 9 

Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp 

330 335 340 

CCC CCA GAG TGC GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC 1107 
Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp 
345 350 355 360 

TGC GGC TCG GTG CAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG 115 5 
Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys 
365 370 375 
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AAA TGC ACC CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT 12 03 
Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys 
380 385 390 

5 

CGC CGC TGC AAG TAC GAA CCA CGG GGT GTG TCc TGC CGA GAG GCC GTG 12 51 
Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val 
395 400 405 

W AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG TGC 1299 

Asn Glu Cys Asp lie Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys 

410 415 420 

CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG CAG 134 7 
Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin 
425 430 435 440 

GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC CAG 13 95 
Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin 

445 450 455 

GTT CTT TGG GGC CAT GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG CTG 1443 
Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu 

460 465 470 

AAT GTG GAG GGG ACG GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC GGC 14 91 
Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly 

475 480 485 

TGG GTC CAG TGC AGT AAG CAG CCC CAA CAG GGA CGT GCT GTG TGG CTT 153 9 
Trp Val Gin Cys Ser Lys Gin Pro Gin Gin Gly Arg Ala Val Trp Leu 
490 495 500 

35 CCT CCT CTQ TGT CAA CAT CTC TGG AGC TCC TCG GCT AGG GGA CCT GGT 158 7 

Pro Pro Leu Cys Gin His Leu Trp Ser Ser Ser Ala Arg Gly Pro Gly 
505 510 515 520 

GGG AGA CAT CAG TAGTGTCACC TTCTACCACC AGGGCAAGGA GCTGGACTGC 16 3 9 

40 

Gly Arg His Gin 

AGGGGAGGCC ACGTGCAGCT GGCGGACGGC TCTGACCTGA GCTATGTGGA GGATGGCACA 16 99 
GCCTGCGGGC CTAACATGTT GTGCCTGGAC CATCGCTGCC TGCCAGCTTC TGCCTTCAAC 175 9 
45 TTCAGCACCT GCCCCGGCAG TGGGGAGCGC CGGATTTGCT CCCACCACGG GGTCTGCAGC 1819 

AATGAAGGGA AGTGCATCTG TCAGC CAGAC TGGACAGGCA AAG A CTG CAG TATCCATAAC 18 7 9 
CCCCTGCCCA CGTCCCCACC CACGGGGGAG ACGGAGAGAT ATAAAGGTCC CAGCGGCACC 193 9 
AACATCATCA TTGGCTC CAT CGCTGGGGCT GTCCTGGTTG CAGCCATCGT CCTGGGCGGC 1999 
ACGGGCTGGG GATTTAAAAA CATTCGCCGA GGAAGGTCCG GAGGGGCCTA AGTGCCACCC 2 059 



25 



30 



50 



55 



88 



BNSDOCI0:<EP 0633268A2> 




EP 0 633 268 A2 

TCCTCCCTCC AAGCCTGGCA CCCACCGTCT CGGCCCTGAA CCACGAGGCT GCCCCCATCC 2119 
AGCCACGGAG GGAGGCACCA TGCAAATGTC TTCCAGGTCC AAACCCTTCA ACTCCTGGCT 2179 
CCGCAGGGGT TTGGGTGGGG GCTGTGGCCC TGCCCTTGGC ACCACCAGGG TGGACCAGGC 223 9 

5 

CTGGAGGGCA CTTCCTCCAC AGTCCCCCAC CCACCTCCTG CGGCTCAGCC TTGCACACCC 2299 
ACTGCCCCGT GTGAATGTAG CTTCCACCTC ATGGATTGCC ACAGCTCAAC TCGGGGG C AC 2359 
CTGGAGGGAT GCCCCCAGGC AGCCACCAGT GGACCTAGCC TGGATGGCCC CTCCTTGCAA 2419 

70 CCAGGCAGCT GAGACCAGGG TCTTATCTCT CTGGGACCTA GGGGGACGGG GCTGACATCT 2479 

ACATTTTTTA AAACTGAATC TTAATCGATG AATGTAAACT CGGGGGTGCT GGGGCCAGGG 253 9 
CAGATGTGGG GATGTTTTGA CATTTACAGG AGGCCCCGGA GAAACTGAGG TATGGCCATG 2599 
CCCTAGACCC TCCCCAAGGA TGACCACACC CGAAGTCCTG TCACTGAGCA CAGTCAGGGG 2659 

15 CTGGGCATCC CAGCTTGCCC CCGCTTAGCC CCGCTGAGCT TGGAGGAAGT ATGAGTGCTG 2 719 

ATTCAAACCA AAGCTGCCTG TGCCATGCCC AAGGCCTAGG TTATGGGTAC GGCAACCACA 2779 
TGTCCCAGAT CGTCTC CAAT TCGAAAACAA CCGTCCTGCT GTCCCTGTCA GGACACATGG 2 83 9 
ATTTTGGCAG GGCGGGGGGG GGTTCTAGAA AATATAGGTT CCTATAATAA AATGGCACCT 28 99 

20 

TCCCCCTTTA AAAAAAAAAA AAAA 2 923 

(2) INFORMATION FOR SEQ ID NO: 7: 
25 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2913 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDNESS: double 

30 

(D) TOPOLOGY: linear 

(ii) MOLECULAR TYPE: cDNA to mRNA 
(vi) ORIGINAL SOURCE: 
35 (A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY : human fetal brain cDNA library 
(ix) FEATURE : 

40 (A) NAME/KEY: 5 ' UTR 

(B) LOCATION: 1..2 7 
(ix) FEATURE: 

(A) NAME / KEY : CDS 

45 

(B) LOCATION: 28.. 2037 
(ix) FEATURE: 

(A) NAME /KEY : 3 ' UTR 
50 <B) LOCATION: 2038.. 2913 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7: 

GCGTTTACTG GCAAACCGCA TTTGTAA ATG TGC TGG CTG AGC CAC CAA CTC 51 

Met Cys Trp Leu Ser His Gin Leu 
1 S 

CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA ACC 99 
Leu Ser Ser Gin Tyr Val Glu Arg His Phe Ser Arg Glu Gly Thr Thr 

10 15 20 

CAG CAC AGC ACC GGG GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG CTC 14 7 

Gin His Ser Thr Gly Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu 
25 30 35 40 

CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG CTG 195 
Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu 
20 45 50 55 

CAT GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA 24 3 
His Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr lie Val Glu Pro Gin 

60 65 70 

GAG GTG GCT GGA CCT TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC CTC 291 
Glu Val Ala Gly Pro Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu 

75 80 85 

ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA CCA 33 9 

lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro 

90 95 100 

GGC TGC CTG TTT GCT GTG CCT GCC CAG TCG GCT CCT CCA AAC CGG CCG 38 7 

Gly Cys Leu Phe Ala Val Pro Ala Gin Ser Ala Pro Pro Asn Arg Pro 
105 110 115 120 

AGG CTG AGA AGG AAA AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG CAC 43 5 

Arg Leu Arg Arg Lys Arg Gin Val Arg Arg Gly His Pro Thr Val His 

125 130 135 

AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG CTG 483 
Ser Glu Thr Lys Tyr Val Glu Leu lie Val lie Asn Asp His Gin Leu 

140 145 150 

TTC GAG CAG ATG CGA CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC AAG 531 
Phe Glu Gin Met Arg Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys 

155 160 ' 165 

TCC GTG GTG AAC CTG GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC ACT 57 9 
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Ser Val Val Asn Leu Ala Asp Val lie Tyr Lys Glu Gin Leu Asn Thr 

170 175 180 

CGC ATC GTC CTG GTT GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC 62 7 

Arg lie Val Leu Val Ala Met Glu Thr Trp Ala Asp Gly Asp Lys lie 
185 190 195 200 

CAG GTG CAG GAT GAC CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC 67 5 
Gin Val Gin Asp Asp Leu Leu Glu Thr Leu Ala Arg Leu Met Val Tyr 

205 210 215 

CGA CGG GAG GGT CTG CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC TCG 72 3 

Arg Arg Glu Gly Leu Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser 

220 225 230 

GGC AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC 771 
Gly Arg Thr Phe Gin Ser Thr Ser Ser Gly Al* Ala Tyr Val Gly Gly 
20 235 240 245 

ATA TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC ATG 819 
lie Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met 
250 ' 255 260 

25 

GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC 86 7 

Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly 
265 270 275 280 

30 ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT CCA 915 

Met Met Trp Asn Lys His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro 

285 290 295 

GAC ATC TGG CTG GGC TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG CCC 96 3 

35 

Asp lie Trp Leu Gly Cys lie Met Glu Asp Thr Gly Phe Tyr Leu Pro 

300 305 310 

CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG CAG 1011 
40 Arg Lys Phe Ser Arg Cys Ser He Asp Glu Tyr Asn Gin Phe Leu Gin 

315 320 325 

GAG GGT GGT GGC AGC TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG GAC 10 5 9 
Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp 

330 335 340 

CCC CCA GAG TGC GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC 1107 
Pro Pro Glu Cys Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp 
345 350 355 360 

TGC GGC TCG GTG CAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG 1155 
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Cys Gly Ser Val Gin Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys 

365 370 375 

AAA TGC ACC CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT 1203 
Lys Cys Thr Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys 

380 385 390 

CGC CGC TGC AAG TAC GAA CCA CGG GGT GTG TCc TGC CGA GAG GCC GTG 12 51 
Arg Arg Cys Lys Tyr Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val 

395 400 405 

AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG TGC 12 9 9 
Asn Glu Cys Asp He Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys 

410 415 420 

CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG CAG 134 7 
Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin 
20 4 2 5 4 3 0 4 3 5 4 4 0 

GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC CAG 13 95 
Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin 

445 450 455 

GTT CTT TGG GGC CAT GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG CTG 144 3 
Val Leu Trp Gly His Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu 

460 465 470 

AAT GTG GAG GGG ACG GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC GGC 14 91 
Asn Val Glu Gly Thr Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly 

475 480 485 

TGG GTC CAG TGC AGT AAG CAG GAC GTG CTG TGT GGC TTC CTC CTC TGT 153 9 
Trp Val Gin Cys Ser Lys Gin Asp Val Leu Cys Gly Phe Leu Leu Cys 

490 495 500 ' 

GTC AAC ATC TCT GGA GCT CCT CGG CTA GGG GAC CTG GTG GGA GAC ATC 1587 
Val Asn He Ser Gly Ala Pro Arg Leu Gly Asp Leu Val Gly Asp He 
505 510 515 520 

AGT AGT GTC ACC TTC TAC CAC CAG GGC AAG GAG CTG GAC TGC AGG GGA 16 3 5 
Ser Ser Val Thr Phe Tyr His Gin Gly Lys Glu Leu Asp Cys Arg Gly 
^5 525 530 535 

GGC CAC GTG CAG CTG GCG GAC GGC TCT GAC CTG AGC TAT GTG GAG GAT 168 3 
Gly His Val Gin Leu Ala Asp Gly Ser Asp Leu Ser Tyr Val Glu Asp 

540 545 550 

GGC ACA GCC TGC GGG CCT AAC ATG TTG TGC CTG GAC CAT CGC TGC CTG 1731 
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Gly Thr Ala Cys Gly Pro Asn Met Leu Cys Leu Asp His Arg Cys Leu 

555 560 565 

CCA GCT TCT GCC TTC AAC TTC AGC ACC TGC CCC GGC AGT GGG GAG CGC 177 9 
Pro Ala Ser Ala Phe Asn Phe Ser Thr Cys Pro Gly Ser Gly Glu Arg 

570 575 580 

CGG ATT TGC TCC CAC CAC GGG GTC TGC AGC AAT GAA GGG AAG TGC ATC 182 7 
Arg lie Cys Ser His His Gly Val Cys Ser Asn Glu Gly Lys Cys lie 
585 590 595 600 

TGT CAG CCA GAC TGG ACA GGC AAA GAC TGC AGT ATC CAT AAC CCC CTG 1875 
Cys Gin Pro Asp Trp Thr Gly Lys Asp Cys Ser lie His Asn Pro Leu 

605 610 615 

CCC ACG TCC CCA CCC ACG GGG GAG ACG GAG AGA TAT AAA GGT CCC AGC 1923 
Pro xhr Ser PtO Fro Thr Gly Glu Thr Glu Arg Tyr Lys Gly Pro Ser 
20 620 62 5 630 

GGC ACC AAC ATC ATC ATT GGC TCC ATC GCT GGG GCT GTC CTG GTT GCA 1971 
Gly Thr Asn lie lie lie Gly Ser lie Ala Gly Ala Val Leu Val Ala 

635 640 645 

GCC TAC GTC CTG GGC GGC ACG GGC TGG GGA TTT AAA AAC ATT CGC CGA 2019 
Ala lie Val Leu Gly Gly Thr Gly Trp Gly Phe Lys Asn lie Arg Arg 

650 655 660 

GGA AGG TCC GGA GGG GCC TAAGTGCCAC CCTCCTCCCT CCAAGCCTGG 2 06 7 

Gly Arg Ser Gly Gly Ala 
665 670 

CACCCACCGT CTCGGCCCTG AACCACGAGG CTGCCCCCAT CCAGCCACGG AGGGAGGCAC 212 7 
CATGCAAATG TCTTCCAGGT CCAAACCCTT CAACTCCTGG CTCCGCAGGG GTTTGGGTGG 218 7 
GGGCTGTGGC CCTGCCCTTG GCACCACCAG GGTGG AC CAG GCCTGGAGGG CACTTCCTCC 2 24 7 
ACAGTCCCCC ACCCACCTCC TGCGGCTCAG CCTTGCACAC CCACTGCCCC GTGTGAATGT 2 3 07 
AGCTTCCACC TCATGGATTG CCACAGCTCA ACTCGGGGGC ACCTGGAGGG ATGCCCCCAG 2 367 
GCAGCCACCA GTGGACCTAG C CTGGATGG C CCCTCCTTGC AACCAGGCAG CTGAGACCAG 24 27 
GGTCTTATCT CTCTGGGACC TAGGGGGACG GGGCTGACAT CTACATTTTT TAAAACTGAA 2 487 
TCTTAATCGA TGAATGTAAA CTCGGGGGTG CTGGGGCCAG GGCAGATGTG GGGATGTTTT 2 54 7 
45 GACATTTACA GGAGGCCCCG GAGAAACTGA GGTATGGCCA TGCCCTAGAC CCTCCCCAAG 2607 

GATGACCACA CCCGAAGTCC TGTCACTGAG CACAGTCAGG GGCTGGGCAT CCCAGCTTGC 2667 
CCCCGCTTAG CCCCGCTGAG CTTGGAGGAA GTATGAGTGC TGATTCAAAC CAAAGCTGCC 2 72 7 
TGTGCCATGC CCAAGGCCTA GGTTATGGGT ACGGCAACCA CATGTCCCAG ATCGTCTCCA 2 787 
ATTCGAAAAC AACCGTCCTG CTGTCCCTGT CAGGACACAT GGATTTTGGC AGGGCGGGGG 2 84 7 
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GGGGTTCTAG AAAATATAGG TTCCTATAAT AAAATGGCAC CTTCCCCCTT TAAAAAAAAA 2907 
AAAAAA 2913 

(2) INFORMATION FOR SEQ ID NO: 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3183 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULAR TYPE: cDNA to mRNA 
<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
(vii) IMMEDIATE SOURCE: 

(A) LIBRARY : human fetal brain cDNA library 
(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..2307 
(ix) FEATURE: 

(A) NAME /KEY : 3 ' UTR 

(B) LOCATION: 2308.. 3183 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

ATG AGG CTG CTG CGG CGC TGG GCG TTC GCG GCT CTG CTG CTG TCG CTG 4 8 

Met Arg Leu Leu Arg Arg Trp Ala Phe Ala Ala Leu Leu Leu Ser Leu 
1 5 10 15 

CTC CCC ACG CCC GGT CTT GGG ACC CAA GGT ccT GCT GGA GCT CTG Cga 96 
Leu Pro Thr Pro Gly Leu Gly Thr Gin Gly Pro Ala Gly Ala Leu Arg 

20 25 30 

TGG GGG GGC TTA CCC CAG CTG GGA GGC CCA GGA GCC CCT GAG GTC ACG 144 
Trp Gly Gly Leu Pro Gin Leu Gly Gly Pro Gly Ala Pro Glu Val Thr 
35 40 45 

45 GAA CCC AGC CGT CTG GTT AGG GAG AGC TCC GGG GGA GAG GTC CGA AAG 192 

Glu Pro Ser Arg Leu Val Arg Glu Ser Ser Gly Gly Glu Val Arg Lys 

50 55 60 

CAG CAG CTG GAC ACA AGG GTC * CGC CAG GAG CCA CCA GGG GGC CCG CCT 24 0 
Gin Gin Leu Asp Thr Arg Val Arg Gin Glu Pro Pro Gly Gly Pro Pro 
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65 70 75 80 

GTC CAT CTG GCC CAG GTG AGT TTC GTC ATC CCA GCC TTC AAC TCA AAC 2 88 

Val His Leu Ala Gin Val Ser Phe Val lie Pro Ala Phe Asn Ser Asn 

85 90 95 

TTC ACC CTG GAC CTG GAG CTG AAC CAC CAc CTC CTC TCC TCG CAA TAC 3 36 

Phe Thr Leu Asp Leu Glu Leu Asn His His Leu Leu Ser Ser Gin Tyr 

100 105 110 

GTG GAG CGC CAC TTC AGC CGG GAG GGG ACA ACC CAG CAC AGC ACC GGG 3 84 

Val Glu Arg His Phe Ser Arg Glu Gly Thr Thr Gin His Ser Thr Gly 

115 120 125 

GCT GGA GAC CAC TGC TAC TAC CAG GGG AAG CTC CGG GGG AAC CCG CAC 4 32 

Ala Gly Asp His Cys Tyr Tyr Gin Gly Lys Leu Arg Gly Asn Pro His 

120 135 140 

TCC TTC GCC GCC CTC TCC ACC TGC CAG GGG CTG CAT GGG GTC TTC TCT 4.80 
Ser Phe Ala Ala Leu Ser Thr Cys Gin Gly Leu His Gly Val Phe Ser 
145 150 155 160 

GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA GAG GTG GCT GGA CCT 52 8 

Asp Gly Asn Leu Thr Tyr lie Val Glu Pro Gin Glu Val Ala Gly Pro 

165 170 175 

TGG GGA GCC CCT CAG GGA CCC CTT CCC CAC CTC ATT TAC CGG ACC CCT 5 76 

30 Trp Gly Ala Pro Gin Gly Pro Leu Pro His Leu lie Tyr Arg Thr Pro 

180 185 190 

CTC CTC CCA GAT CCC CTC GGA TGC AGG GAA CCA GGC TGC CTG TTT GCT 624 
Leu Leu Pro Asp Pro Leu Gly Cys Arg Glu Pro Gly Cys Leu Phe Ala 

195 200 205 

GTG CCT GCC CAG TCG GCT CCT CCA AAC . CGG CCG AGG CTG AGA AGG AAA 67 2 

Val Pro Ala Gin Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys 

210 215 220 

AGG CAG GTC CGC CGG GGC CAC CCT ACA GTG CAC AGT GAA ACC AAG TAT 72 0 

Arg Gin Val Arg Arg Gly His Pro Thr Val His Ser Glu Thr Lys Tyr 
225 230 235 240 

GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG CTG TTC GAG CAG ATG CGA 768 
Val Glu Leu lie Val lie Asn Asp His Gin Leu Phe Glu Gin Met Arg 

245 250 255 

CAG TCG GTG GTC CTC ACC AGC AAC TTT GCC AAG TCC GTG GTG AAC CTG 816 
Gin Ser Val Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu 



35 



40 



45 



50 



55 



.95 



BNSDOCID: <EP 0633268A2> 



EP0 633 268 A2 



25 



260 265 270 

GCC GAT GTG ATA TAC AAG GAG CAG CTC AAC ACT CGC ATC GTC CTG GTT 864 
Ala Asp Val lie Tyr Lys Glu Gin Leu Asn Thr Arg lie Val Leu. Val 

275 280 285 

GCC ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC CAG GTG CAG GAT GAC 912 
Ala Met Glu Thr Trp Ala Asp Gly Asp Lys lie Gin Val Gin Asp Asp 
W 290 295 300 

CTC CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC CGA CGG GAG GGT CTG 960 
Leu Leu Glu Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu 
305 310 315 320 

15 

CCT GAG CCC AGT AAT GCC ACC CAC CTC TTC TCG GGC AGG ACC TTC CAG 1008 
Pro Glu Pro Ser Asn Ala Thr His Leu Phe Ser Gly Arg Thr Phe Gin 
325 330 335 

20 AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC ATA TGC TCC CTG TCC 1056 

Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly lie Cys Ser Leu Ser 

340 345 350 

CAT GGC GGG GGT GTG AAC GAG TAC GGC AAC ATG GGG GCG ATG GCC GTG 1104 
His Gly Gly Gly Val Asn Glu Tyr Gly Asn Met Gly Ala Met Ala Val 

355 360 365 

ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC ATG ATG TGG AAC AAA 1152 
Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly Met Met Trp Asn Lys 

370 375 380 

CAC CGG AGC TCG GCA GGG GAC TGC AAG TGT CCA GAC ATC TGG CTG GGC 1200 
His Arg Ser Ser Ala Gly Asp Cys Lys Cys Pro Asp lie Trp Leu Gly 
385 390 395 400 

TGC ATC ATG GAG GAC ACT GGG TTC TAC CTG CCC CGC AAG TTC TCT CGC 124 8 
Cys lie Met Glu Asp Thr Gly Phe Tyr Leu Pro Arg Lys Phe Ser Arg 

405 410 415 

TGC AGC ATC GAC GAG TAC AAC CAG TTT CTG CAG GAG GGT GGT GGC AGC 12 96 
Cys Ser lie Asp Glu Tyr Asn Gin Phe Leu Gin Glu Gly Gly Gly Ser 
420 425 430 

45 TGC CTC TTC AAC AAG CCC CTC AAG CTC CTG GAC CCC CCA GAG TGC GGG 1344 

Cys Leu Phe Asn Lys Pro Leu Lys Leu Leu Asp Pro Pro Glu Cys Gly 

435 440 445 

AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC TGC GGC TCG GTG CAG 13 92 
Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val Gin 
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450 455 460 

GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG AAA TGC ACC CTG ACT 1440 
Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr Leu Thr 
465 470 475 480 

CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT CGC CGC TGC AAG TAC 14 8 8 
His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys Lys Tyr 

485 490 495 

GAA CCA CGG GGT GTG TCC TGC CGA GAG GCC GTG AAC GAG TGC GAC ATC 1536 
Glu Pro Arg Gly Val Ser Cys Arg Glu Ala Val Asn Glu Cys Asp He 
500 505 510 

75 GCG GAG ACC TGC ACC GGG GAC TCT AGC CAG TGC CCG CCT AAC CTG CAC 1584 

Ala Glu Thr Cys Thr Gly Asp Ser Ser Gin Cys Pro Pro Asn Leu His 

515 520 525 

AAG CTG GAC GGT TAC TAC TGT GAC CAT GAG CAG GGC CGC TGC TAC GGA 16 32 

on 

Lys Leu Asp Gly Tyr Tyr Cys Asp His Glu Gin Gly Arg Cys Tyr Gly 

530 535 540 

GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC CAG GTT CTT TGG GGC CAT 168 0 
25 Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys Gin Val Leu Trp Gly His 

545 550 555 560 

GCG GCT GCT GAT CGC TTC TGC TAC GAG AAG CTG AAT GTG GAG GGG ACG 1728 
Ala Ala Ala Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr 

565 570 575 

GAG CGT GGG AGC TGT GGG CGC AAG GGA TCC GGC TGG GTC CAG TGC AGT 17 76 
Glu Arg Gly Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser 

580 585 590 

AAG CAG GAC GTG CTG TGT GGC TTC CTC CTC TGT GTC AAC ATC TCT GGA 18 24 
Lys Gin Asp Val Leu Cys Gly Phe Leu Leu Cys Val Asn He Ser Gly 

595 600 605 

GCT CCT CGG CTA GGG GAC CTG GTG GGA GAC ATC AGT AGT GTC ACC TTC 18 72 
Ala Pro Arg Leu Gly Asp Leu Val Gly Asp He Ser Ser Val Thr Phe 

610 615 620 

TAC CAC CAG GGC AAG GAG CTG GAC TGC AGG GGA GGC CAC GTG CAG CTG 192 0 
Tyr His Gin Gly Lys Glu Leu Asp Cys Arg Gly Gly His Val Gin Leu 
625 630 635 640 

GCG GAC GGC TCT GAC CTG AGC TAT GTG GAG GAT GGC ACA GCC TGC GGG 1968 
50 Ala Asp Gly Ser Asp Leu Ser Tyr Val Glu Asp Gly Thr Ala Cys Gly 
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645 650 655 

CCT AAC ATG TTG TGC CTG GAC CAT CGC TGC CTG CCA GCT TCT GCC TTC 2016 
5 Pro Asn Met Leu Cys Leu Asp His Arg Cys Leu Pro Ala Ser Ala Phe 

660 665 670 

AAC TTC AGC ACC TGC CCC GGC AGT GGG GAG CGC CGG ATT TGC TCC CAC 2064 
Asn Phe Ser Thr Cys Pro Gly Ser Gly Glu Arg Arg lie Cys Ser His 
10 SIS 680 685 

CAC GGG GTC TGC AGC AAT GAA GGG AAG TGC ATC TGT CAG CCA GAC TGG 2112 
His Gly Val Cys Ser Asn Glu Gly Lys Cys lie Cys Gin Pro Asp Trp 
690 695 700 

75 

ACA GGC AAA GAC TGC AGT ATC CAT AAC CCC CTG CCC ACG TCC CCA CCC 2160 
Thr Gly Lys Asp Cys Ser lie His Asn Pro Leu Pro Thr Ser Pro Pro 
705 710 715 720 

20 ACG GGG GAG ACG GAG AGA TAT AAA GGT CCC AGC GGC ACC AAC ATC ATC 2208 

Thr Gly Glu Thr Glu Arg Tyr Lys Gly Pro Ser Gly Thr Asn lie lie 

725 730 735 

ATT GGC TCC ATC GCT GGG GCT GTC CTG GTT GCA GCC ATC GTC CTG GGC 2256 

25 

lie Gly Ser He Ala Gly Ala Val Leu Val Ala Ala He Val Leu Gly 

740 745 750 

GGC ACG GGC TGG GGA TTT AAA AAC ATT CGC CGA GGA AGG TCC GGA GGG 2 3 04 
30 Gly Thr Gly Trp Gly Phe Lys Asn He Arg Arg Gly Arg Ser Gly Gly 

755 760 765 

GCC TAAGTGCCAC CCTCCTCCCT CCAAGCCTGG CACCCACCGT CTCGGCCCTG 23 5 7 

Ala 





AACCACGAGG 


CTGCCCCCAT 


CCAGCCACGG 


AGGGAGGCAC 


CATGCAAATG 


TCTTCCAGGT 


2417 




CCAAACCCTT 


CAACTCCTGG 


CTCCGCAGGG 


GTTTGGGTGG 


GGGCTGTGGC 


CCTGCCCTTG 


2477 




GCACCACCAG 


GGTGG AC CAG 


GCCTGGAGGG 


CACTTCCTCC 


ACAGTCCCCC 


ACCCACCTCC 


2537 


40 


TGCGGCTCAG 


CCTTGCACAC 


CCACTGCCCC 


GTGTGAATGT 


AGCTTCCACC 


TCATGGATTG 


2597 




CCACAGCTCA 


ACTCGGGGGC 


ACCTGGAGGG 


ATGCCCCCAG 


GCAGCCACCA 


GTGGACCTAG 


2657 




CCTGGATGGC 


CCCTCCTTGC 


AACCAGGCAG 


CTGAGACCAG 


GGTCTTATCT 


CTCTGGGACC 


2717 




TAGGGGGACG 


GGGCTGACAT 


CTACATTTTT 


TAAAACTGAA 


TCTTAATCGA 


TG AATG T AAA 


2777 


45 


CTCGGGGGTG 


CTGGGGCCAG 


GGCAGATGTG 


GGGATGTTTT 


GACATTTACA 


GGAGGCCCCG 


2837 




GAGAAACTGA 


GGTATGGCCA 


TGCCCTAGAC 


CCTCCCCAAG 


GATGACCACA 


CCCGAAGTCC 


2897 




TGTCACTGAG 


CACAGTCAGG 


GGCTGGGCAT 


CCCAGCTTGC 


CCCCGCTTAG 


CCCCGCTGAG 


2957 


50 


CTTGGAGGAA 


GTATGAGTGC 


TGATTCAAAC 


CAAAGCTGCC 


TGTGCCATGC 


CCAAGGCCTA 


3017 




GGTTATGGGT 


ACGGCAACCA 


CATGTCCCAG 


ATCGTCTCCA 


ATTCGAAAAC 


AACCGTCCTG 


3077 
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CTGTCCCTGT CAGGACACAT GGATTTTGGC AGGGCGGGGG GGGGTTCTAG AAAATATAGG 3137 
TTCCTATAAT AAAATGGCAC CTTCCCCCTT TAAAAAAAAA AAAAAA 3183 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9278 base pairs 

(B) TYPE: nucleic acid 
{ C ) STRANDNESS : double 
(D) TOPOLOGY: linear 

75 (ii) MOLECULE TYPE: Genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
(viij IMMEDIATE SOURCE; 
£U (A) LIBRARY: human DNA cosmid library- 

fix) FEATURE: 

(A) NAME / KEY : exon 1 

(B) LOCATION: 28 . .44 
(ix) FEATURE: 

(A) NAME / KEY : exon 2 

(B) LOCATION: 308.. 374 
(ix) FEATURE: 

(A) NAME / KEY : exon 3 

(B) LOCATION: 909.. 994 
(ix) FEATURE: 

(A) NAME /KEY : exon 4 

(B) LOCATION: 1081.. 1156 
(ix) FEATURE: 

4 0 (A) NAME / KEY : exon 5 

(B) LOCATION: 1591.. 1657 
(ix) FEATURE: 

(A) NAME /KEY : exon 6 

45 

(B) LOCATION: 1725.. 1792 
(ix) FEATURE: 

(A) NAME/KEY: exon 7 
50 (B) LOCATION: 2182.. 2256 

(ix) FEATURE: 
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<A) NAME /KEY : xon 8 
(B) LOCATION: 2339.. 2410 
<ix) FEATURE: 

(A) NAME /KEY : exon 9 

(B) LOCATION: 2588.. 2754 
(ix) FEATURE: 

(A) NAME /KEY : exon 10 

(B) LOCATION: 3248.. 3332 
(ix) FEATURE: 

(A) NAME/KEY: exon 11 

(B) LOCATION: 3445.. 3535 
(ix) FEATURE: 

(A) NAME/KEY: exon 12 
20 (B) LOCATION: 3645.. 3696 

(ix) FEATURE: 

(A) NAME / KEY : exon 13 

(B) LOCATION: 4014.. 4113 
(ix) FEATURE: 

(A) NAME /KEY : exon 14 

(B) LOCATION: 4196.. 4267 
(ix) FEATURE: 

(A) NAME /KEY : exon 15 

(B) LOCATION: 4386.. 4478 
(ix) FEATURE: 

(A) NAME / KEY : exon 16 

(B) LOCATION: 4920.. 5000 
(ix) FEATURE: 

(A) NAME /KEY : exon 17 

(B) LOCATION: 5347.. 5397 
(ix) FEATURE: 

(A) NAME /KEY : exon 18 
45 (B) LOCATION: 5501.. 5564 

(ix) FEATURE: 

(A) NAME/KEY: exon 19 

(B) LOCATION: 5767 5866 
(ix) FEATURE: 
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(A) NAME/KEY: exon 20 

(B) LOCATION: 6073.. 6202 
(ix) FEATURE: 

(A) NAME/KEY: exon 21 

(B) LOCATION: 6300.. 6468 
(ix) FEATURE: 

(A) NAME/KEY: exon 22 

(B) LOCATION: 6557.. 6671 
(ix) FEATURE: 

(A) NAME /KEY : exon 23 

(B) LOCATION: 6756.. 6646 
(ix) FEATURE: 

(A) NAME /KEY: exon 24 

20 (B) LOCATION: 782 9.. 7846 

(ix) FEATURE: 

(A) NAME /KEY : exon 25 

(B) LOCATION: 8165.. 9038 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 
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35 



GCGTTTACTG GCAAACCGCA TTTGTAA ATG TGC TGG CTG AGC CA NNNNNNNNNN 54 

Met Cys Trp Leu Ser His 
1 5 

NNNNCCAGGT GAGTTTCGTC ATCCAGCCTT CAACTCAAAC TTCACCCTGG ACCTGGAGCT 114 
GAACCAGTGA GNGTGGCCTT GAGCCCAAGA GGAAGGGCAG TGGTGGNNNG GGGGAGACAT 174 
GGCTAGGGCC TGGCTGCTGG GGGTCTGGGG GTTGGGCCTG GCGAGAGGGG ACCTGGGTCC 2 34 
TGACCTGAGG CGAGCCTAAA GCCCGACCTC ACCTCGCCCG TGACCCCCCT TCCTGCTGCC 2 94 
CCCTCTGTCT CAG C CAA CTC CTC TCC TCG CAA TAC GTG GAG CGC CAC TTC 344 
Gin Leu Leu Ser Ser Gin Tyr Val Glu Arg His Phe 
10 15 
AGC CGG GAG GGG ACA ACC CAG CAC AGC ACC GTGAGTGCCA CTGCTGGGGA 3 94 

Ser Arg Glu Gly Thr Thr Gin His Ser Thr 
45 20 25 

CCGGGGCCGG GGATGGAAGG GAGGTGCTGT TTCTGTGGTT CTGTGGTCAC AGGTGTAGGG 4 54 
ACAGGTGGCC ACTGGAGATG GGGTCCTGGG CCTGGCCCCT CAGCACCTTC CCTCTCTCCC 514 
GACCCAGGAG GCTCTGAGGG TGGACAGTGG GCAGCTTAGT GCATAGGGCC CTGAAGTCCC 574 
CTC AC TTGG C CCCAGAGCTC TGACCCCCAG CCAGCCCACG TGGGGCCTAC AGGGACACTC 634 
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GTTCCGAGCA GGCTGCCAGG ATCCNNNNNN NNNNNNATAG ATGACGTGAA GGAGGCCCAG 6 94 
AGGTTC CTAA CCCCAGAGGG CTAGGAACTT GCCCAGGGTG GCACGGCAAA TTAGGAGCAC 754 
CAGCCATCTA GAAACAGGCT CCAGAGCCCC AGGNATACCC AGGGATNGTG GCCACCTGCA 814 
CACAGGGCAG CTTCAGTGTC CCCCAAAAAG CCTTGAGGCC CATTGGCTGC CCCCGGCCTC 874 
ATGCCAGCGT TCTGCTCACT GTTCTGCTCC TTAG GGG GCT GGA GAC CAC TGC TAC 929 

Gly Ala Gly Asp His Cys Tyr 
30 35 
TAC CAG GGG AAG CTC CGG GGG AAC CCG CAC TCC TTC GCC GCC CTC TCC 977 
Tyr Gin Gly Lys Leu Arg Gly Asn Pro His Ser Phe Ala Ala Leu Ser 

40 45 50 

ACC TGC CAG GGG CTG CA GTGAGTATGG GGAGGGGCCG GGCAGCTGGG 1024 
Thr Cys Gin Gly Leu His 
55 

20 AGAAGCCTCT GGCCCAGGCC TGGGGACGGA GGGGAGCTGC GCCTCTCTCT CCACAG T 1081 

GGG GTC TTC TCT GAT GGG AAC TTG ACT TAC ATC GTG GAG CCC CAA GAG 112 9 
Gly Val Phe Ser Asp Gly Asn Leu Thr Tyr lie Val Glu Pro Gin Glu 

60 65 70 

GTG GCT GGA CCT TGG GGA GCC CCT CAG GTAAGCCCCA CACAACCCCT 1176 
Val Ala Gly Pro Trp Gly Ala Pro Gin 

75 80 
TGCCATCCTC TCTGGTGGCC CTGCCAAGCT TGTCCCAACA G CTG TTGCTG CCACCTCTTC 12 36 
CTCCTCCGGC TCCTCCCTCA GT AAC C C CAG CCTCACTGCC CTCTTCAGTG ACCCCAGCTC 13 96 
TGGTTCCCTC CCTCCTGTGC CCCAGCTCCC CCTGTGCCCC CAGCTCCAAT GTCCCATCTG 13 56 
TCCCATAAGT GACCTCCCAT TGGGCTCCAA TGTCCTTTGC CCCTGTCTCT CAGGGTGCCC 1416 
CCAGGTCTTG ACCCCGGAAT CTGAGCATCT GGGAGATCAG ATCCGACATG GGAGCTGTGG 14 76 
CCAGTTCTGG GTCACCCCAG GGTGGGGTGG AGGCGAGGGC TGGATCTGGC CCCCGCCAAG 1536 
TGGCCTGGAG CAGGCCCAGT TGGCACCCCA AGAACTAATT TCCCCTCATT GCAG GGA 15 93 

Gly 

CCC CTT CCC CAC CTC ATT TAC CGG ACC CCT CTC CTC CCA GAT CCC CTC 1641 
Pro Leu Pro His Leu lie Tyr Arg Thr Pro Leu Leu Pro Asp Pro Leu 

85 90 95 

45 GGA TGC AGG GAA CCA G GTAAGGGAGG GGAGGGGGGG TGGGGAGGGG CCNGGCTGTG 16 97 

Gly Cys Arg Glu Pro Gly 

100 

CCCCCCTCAC CTGCCCCTCC CCGACAG GC TGC CTG TTT GCT GTG CCT GCC CAG 1750 

Cys Leu Phe Ala Val Pro Ala Gin 
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105 110 
TCG GCT CCT CCA AAC CGG CCG AGG CTG AGA AGG AAA AGG CAG 17 92 

Ser Ala Pro Pro Asn Arg Pro Arg Leu Arg Arg Lys Arg Gin 

115 120 125 

GTACGGGGGC CCGCACAGAC CTCGGGCTGC AGAGACCTCG GGCTGCAGAG AGACCTCGGC 18 52 
CGTGGCCCAG AGCAGGAGGG CACCCTCATC TATGGCTGGG GCGAAGGAAG GCTCAGATGG 1912 
ATGTGGCTGG GGG CCAGGG A CCGTGTCTGG GAGAAGCCCC CACCCCTTCC CTAATGCTGG 1972 
CATCTACAGA GGCCCCATCC TGGGCAAACC GAGGCTGCCT GCCCTCATTC CAAAGCTGAG 2032 
GAAGGACAGG ACCCTCTGCC AGTGGGGAGC TGGCACTGTC CCTGGCTGGA GTCCAGACCC 2 092 
CCCCATCCCC ACCGAGTCTG TTCCTGGCTT GGCCATGAGA TCAGTCAGAC ATGGAAGGGA 2152 
CTGATTCCAA GTGCCCACCC ACCCCCCAG GTC CGC CGG GGC CAC CCT ACA GTG 2205 

Val Arg Arg Gly His Pro Thr Val 
130 135 
20 CAC AGT GAA ACC AAG TAT GTG GAG CTA ATT GTG ATC AAC GAC CAC CAG 2253 

His Ser Glu Thr Lys Tyr Val Glu Leu lie Val lie Asn Asp His Gin 

140 145 150 

CTG GTGAGTGCCA GGGCAGGGAC AGGGCGTGAC ACTGGGAGGC CCCTGAGGAG 2 306 

Leu 

CCTGGCCCTC CTCCCATTCT TCTCTCTCCC AG TTC GAG CAG ATG CGA CAG TCG 23 5 9 

Phe Glu Gin Met Arg Gin Ser 
1S5 

GTG GTC CTC ACC AGC AAC TTT GCC AAG TCC GTG GTG AAC CTG GCC GAT 24 07 
Val Val Leu Thr Ser Asn Phe Ala Lys Ser Val Val Asn Leu Ala Asp 
160 165 170 175 

35 GTG GTAAGCAGCT CTCCCTCCCT CCCTTCCCTC CTCCTCATGC CCCCCCACCC 2460 

Val 

CACCACACAC ATTAGGGGGC ACTGTCAGCC CCTGGCTCCC ACTTCCTGGA GAGAACAGAC 2 52 0 
AGGCCCTCCT CCAGCCCTGG CCCCAACACC CACTCCCACC CTCCAGCCCC CCTCATCTTC 2 58 0 

40 

TCCCCAG ATA TAC AAG GAG CAG CTC AAC ACT CGC ATC GTC CTG GTT GCC 262 9 
He Tyr Lys Glu Gin Leu Asn Thr Arg He Val Leu Val Ala 
180 185 190 

45 ATG GAA ACA TGG GCA GAT GGG GAC AAG ATC CAG GTG CAG GAT GAC CTC 26 7 7 

Met Glu Thr Trp Ala Asp Gly Asp Lys He Gin Val Gin Asp Asp Leu 

195 200 205 

CTG GAG ACC CTG GCC CGG CTC ATG GTC TAC CGA CGG GAG GGT CTG CCT 2 72 5 
Leu Glu Thr Leu Ala Arg Leu Met Val Tyr Arg Arg Glu Gly Leu Pro 
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210 215 220 

GAG CCC AGT AAT GCC ACC CAC CTC TTC TC GTGAGTCCCC CACCCTGCAC 2774 
Glu Pro Ser Asn Ala Thr His Leu Phe Ser 

225 230 
CTCCTGCCAG C CTCTGCT AG TTGCTACAGT GCTTGGGATT ACTTAACACC TGCCCTGTGC 2834 
TGGCTGCTCC TCTCAGAGTC TGGGGACTGG GCTCACCTTG CACCTGCCAC CTACCCCCAG 2 8 94 
CCACATGCAA CAGCTGGGCA TCATCCCCTG AATCTGAGGT TGATGCCCTT GTCTTAGCCC 2 954 
TGGTGGTCCT CTTCTGCCTC TCACCTCCCC TTAGTTCTGT CTTTCCCTTC AACTGTCCCN 3014 
NNNNNNNNNN NAGAGTGAAA CTCTGTCTCA AAAGAAAAAN AAAANAAAAG AAGAAAAAAA 3 074 
'5 AGAACCCAAG GAGCGGGGGA AGGGTCTTGC CTGGGGTCAC CAAGGCTGAT GTAAAGGGCC 3134 

AGGCTCACCT CCTGAGGAAG GACTCTAGTG TGAGGGGCTC CCCAAGGCCC CACCACCACC 3194 
CGGGGAGCCA CAGGGGAGGG CAGAAGCCAT CCTGACAGCG CACTCCCTTC CAG G GGC 32 51 

Gly 

AGG ACC TTC CAG AGC ACG AGC AGC GGG GCA GCC TAC GTG GGG GGC ATA 32 99 
Arg Thr Phe Gin Ser Thr Ser Ser Gly Ala Ala Tyr Val Gly Gly lie 
235 240 245 

25 TGC TCC CTG TCC CAT GGC GGG GGT GTG AAC GAG GTGAGCAGTG 3 342 

Cys Ser Leu Ser His Gly Gly Gly Val Asn Glu 
250 255 260 

GGGGGACATG GCTGGGGTGG CGGCTGAGGG AAAGGGGCTT AGGGG C ACG A CGTGCCTGNT 3402 
TGGAAGATGT AGACATCTGT GCCCCATCTT CCCCACCCCC AG TAC GGC AAC ATG 3 4 56 

Tyr Gly Asn Met 

GGG GCG ATG GCC GTG ACC CTT GCC CAG ACG CTG GGA CAG AAC CTG GGC 3 504 
Gly Ala Met Ala Val Thr Leu Ala Gin Thr Leu Gly Gin Asn Leu Gly 
265 270 275 280 

ATG ATG TGG AAC AAA CAC CGG AGC TCG GCA G GTATCCTCCC CCAGAGGCCC 3 5 55 
Met Met Trp Asn Lys His Arg Ser Ser Ala Gly 
40 285 290 

CCGTGTGGCC CAGCAGCTCT GGAACGGGAG GGTGACAGTG GGAGGGGTGG TCCTTGGCCT 3 615 
CCCTCATATC CGCCTGGCTC ACCCCTCAG GG GAC TGC AAG TGT CCA GAC ATC 3 667 

Asp Cys Lys Cys Pro Asp lie 
295 

TGG CTG GGC TGC ATC ATG GAG GAC ACT GG GTGAGTTCTT GGGG AC AAC C 3 716 

Trp Leu Gly Cys lie Met Glu Asp Thr Gly 

300 305 
GGGGGAAGGT CTTGGGCGAG GGGAGTCTTA G AG CGAGC AT TGTTTGGCAG TCTGGACCAG 3 7 76 
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GGGNNNNNNN NNNNNGAACA CACCTTCCCT TCCAGGCCGG CTTGCGAGTC CCAGGTTCAA 3 836 
GCGAGGGATG GGAGCGACAA GGGACAAGGC GGAGGATTCT GGTGCAATCC CGGGG C AG AT 3 896 
CCTCCGCCTC CTCGCGATGG TGACGAAGTC CCCCAGTGTA CCCCCTCCCC AGCCTTGAGA 3 956 
GGGGTGAGGG TGGGTTGGAG GGGAGCAGCC AGCAGCACCT CCCCTCGCCC TATCCAG G 4014 
TTC TAC CTG CCC CGC AAG TTC TCT CGC TGC AGC ATC GAC GAG TAC AAC 4062 
Phe Tyr Leu Pro Arg Lys Phe Ser Arg Cys Ser lie Asp Glu Tyr Asn 

310 315 320 

CAG TTT CTG CAG GAG GGT GGT GGC AGC TGC CTC TTC AAC AAG CCC CTC 4110 
Gin Phe Leu Gin Glu Gly Gly Gly Ser Cys Leu Phe Asn Lys Pro Leu 
15 325 330 335 340 

AAG GTACCAGCCC CGCGGCGGGG AGCATGGGAG CGGGCCCTGG GCGGGGTCCG 4163 
Lys 

GG C CAG AC T C CCGACCTGTC CTCCCCGTCC AC CTC CTG GAC CCC CCA GAG TGC 4 216 

Leu Leu Asp Pro Pro Glu Cys 
345 

GGG AAC GGC TTC GTG GAG GCA GGG GAG GAG TGC GAC TGC GGC TCG GTG 4 264 
25 Gly Asn Gly Phe Val Glu Ala Gly Glu Glu Cys Asp Cys Gly Ser Val 

350 355 360 

CAG GTGAGCGGTG GTGCGGGCGC CAGGTGGGGA ACCGGGATGC GGGGGTGGGC 4 317 

Gin 

30 365 

ACCAGGGAGC GTCTGAGTGG GAGGATTAGG GCTCGCCCGC CTCCTTCCCC TCCTCCCGCG 4 3 77 
TCCCTCAG GAG TGC AGC CGC GCA GGT GGC AAC TGC TGC AAG AAA TGC ACC 4 42 7 
Glu Cys Ser Arg Ala Gly Gly Asn Cys Cys Lys Lys Cys Thr 

35 

370 375 
CTG ACT CAC GAC GCC ATG TGC AGC GAC GGG CTC TGC TGT CGC CGC TGC 44 7 5 
Leu Thr His Asp Ala Met Cys Ser Asp Gly Leu Cys Cys Arg Arg Cys 
40 380 385 390 395 

AAG GTAAGCAGGA CCGGCCGGGA GGCGGGGCCA GGACGCAGGA GGAGCGATTG 4 52 8 

Lys 

GAGGCCTTCA TATAAGGGGT GGGAGCTAGG G AG GG AAG CG GAGCCTTCGG GGACGAAGGC 4 58 8 
CTCTGGGGCA GGGCTTGATG CGAAGACAGC GCCAATGGGA GCAAGGGCGG GCTGAAGGAT 4 64 8 
GTTGAAGGCN NNNNNNNNNN NNNCGGACGG GAAGCTCCCA GAATCAAGGA GGGCGGGAAG 4 708 
GTGGGCGGGC TTGGGGCGGT GCTGAGTGCG CTGGGAGCGA GGTGGGGAGC G TTC AAG AGG 4 76 8 
TGGTGGGAGC AGGGAAATAA GAACAGGCCT AAACGGGGCC CTGGGGAGCT GGAGGGCCCG 4 82 8 
GGGATGTGGG GGTCCAGAGA GCGGGGGGCC TGGGGAGGGC AGGGCCGAGG CATCCATCCT 4 88 8 
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20 



25 



GCCTGACTCG AGGAGCGCGT CTCTTCCCTA G TAC GAA CCA CGG GGT GTG TCC 4 94 0 

Tyr Glu Pro Arg Gly Val Ser 
400 

TGC CGA GAG GCC GTG AAC GAG TGC GAC ATC GCG GAG ACC TGC ACC GGG 4 988 
Cys Arg Glu Ala Val Asn Glu Cys Asp lie Ala Glu Thr Cys Thr Gly 

405 410 415 

GAC TCT AGC CAG GTCCGCCCGG CCCCGCCGTC TTGTGGAGCC CTGGGCGAGG 5040 
Asp Ser Ser Gin 
420 

/5 CAACCCCTAC CCTTGTCGAT TTGGTTTTCC CGGACGAGTG CTCAGCACTC CCCTCCTCTC 5100 

CACAGCTGGC ATCGACCTTC ACTGATCAGA CTGTTTTCTT ATCTGAGAAA GGGGTTCTTC 5160 
ATGCTCCTGG CCTTGTTCCT TCAATCATTA AACCAGAATG TATCGTCTGG CTGGTATCCC 5220 
AGCGCCTGGG CCCGGTGNNN NNNNNNNNTA CCCAGATTCC TCCTGGGCAG CCCTCAGCTC 52 80 
CAGTCCTGGG CAGCCCTCAG CCCAGTCCTG GGACTGCTCC GCTCAACCCC ACCCCTCTCT 534 0 
CCACAG TGC CCG CCT AAC CTG CAC AAG CTG GAC GGT TAC TAC TGT GAC 5388 
Cys Pro Pro Asn Leu His Lys Leu Asp Gly Tyr Tyr Cys Asp 
425 430 435 

CAT GAG CAG GTATGATGGC TGCCCCCTGA GCCTGGGATT CAGGGCAGTC 543 7 

His Glu Gin 
440 

30 TCTTATCTCC ACTCTGACCA CTCAGCATCT CCATCCCTTG CCTCTTAATT CTTGGACTCT 54 97 

CAG GGC CGC TGC TAC GGA GGT CGC TGC AAA ACC CGG GAC CGG CAG TGC 554 5 
Gly Arg Cys Tyr Gly Gly Arg Cys Lys Thr Arg Asp Arg Gin Cys 
445 450 455 

CAG GTT CTT TGG GGC CAT G GTGAGTCTGC TAGGGCTGGA GTGGGACTCC 5 594 

Gin Val Leu Trp Gly His Ala 
460 

40 GGAGGAGCCC AGAGCTGAGA AGCTGGGGAG AGTGGGTTCC AGCTGAACAG GCCCCCAAGT 56 54 

GTGTAGCTCC CCAGGATCTC AGGGAGCCCA GGCAGAGTGT GGGAGATGCA GGCCTGAGGT 5714 
CTTGGGGTGG GTCCTGGGGC ACGTGGGGTC ACTTGGCATC CTCTCCCCAC AG CG GCT 5771 

Ala 

45 

GCT GAT CGC TTC TGC TAC GAG AAG CTG AAT GTG GAG GGG ACG GAG CGT 5819 
Ala Asp Arg Phe Cys Tyr Glu Lys Leu Asn Val Glu Gly Thr Glu Arg 
465 470 475 

50 GGG AGC TGT GGG CGC AAG GGA TCC GGC TGG GTC CAG TGC AGT AAG CA 5 866 

Gly Ser Cys Gly Arg Lys Gly Ser Gly Trp Val Gin Cys Ser Lys Gin 
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480 485 490 495 

GTGAGTACTG AGGCTCCCAG AGGGCCTCTC AGCTCCAGGG CAGGTGTGAG ACTTTTCAGA 5926 
5 GATGGGGTAG TAGGTTCTCC CAGGAGGAGC CTGTCAGTCC CAATGGGCGG GCACGTGGCA 5986 

AATGAGGTGG CAGGGTGCAG GGTGAGGG C A GATTAGAGTT CAGTAGTTGA GTCTGAGGTC 6046 
AAACTTGGGG CTCACTGTCT CTATAT G CCC CAA CAG GGA CGT GCT GTG TGG 6 097 

Pro Gin Gin Gly Arg Ala Val Trp 

70 

500 

CTT CCT CCT CTG TGT CAA CAT CTC TGG AGC TCC TCG GCT AGG GGA CCT 614 5 
Leu Pro Pro Leu Cys Gin His Leu Trp Ser Ser Ser Ala Arg Gly Pro 
75 505 510 S15 

GGT GGG AGA CAT CAG TAGTGTCACC TTCTACCACC AGGGCAAGGA GCTGGACTGC 6200 

Gly Gly Arg His Gin 

520 

20 

AGGTGCTGAC CAGCAC CAAA ACTCAGGGAG GGGACCTGGC AGCTGTGCTG GGGGTTAGAA 6260 
GATCTGGGGG CTGGAGGCTG GGCTGTGTCA CTTCCCCAGG GGAGGCCACG TGCAGCTGGC 6320 
GGACGGCTCT GACCTGAGCT ATGTGGAGGA TGGCACAGCC TGCGGGCCTA ACATGTTGTG 6380 

25 C CTGG AC CAT CGCTGCCTGC CAG CTTCTGC CTTCAACTTC AGCACCTGCC CCGGCAGTGG 6440 

GGAGCGCCGG ATTTGCTCCC ACCACGGGGT GACTGCCTGG AGCCCGGGAT GGCGGGAGAA 6 500 
GCTTACAAGA GGGGACAGGC CCCTGCTCAC CTCTCCTGGC CCTGCCCTGC CTCTAGGTCT 6 560 
GCAGCAATGA AGGGAAGTGC AT C TGT C AGC CAGACTGGAC AGGCAAAGAC TGCAGTATCC 662 0 

30 ATAACCCCCT GCCCACGTCC CCACCCACGG GGGAGACGGA GAGATATAAA GGTGAGGCTG 66 8 0 

GAGCTGGCCG AGGGGGGTCT GTCTGTCCCG CTCTCTATGC CTGTCCTTGC CAGCTAAGCC 6 74 0 
CTGCCATCCT CCCAGGTCCC AGCGGCACCA ACATCATCAT TGGCTCCATC GCTGGGGCTG 6 800 
TCCTGGTTGC AGCCATCGTC CTGGGCGGCA CGGGCTGGGG ATTTAAGTAA GAGACACACA 6 86 0 

35 

CACCCTGTGC CCCCTGGCAT CCTTGAGGGG GGATCAGAAT CCCTACTGGT GGAGCTGAGG 6 920 
GGGCCCTCCC TGAAAGCCCA ACTGAACCAG AGCTCACACG TCATAGGTCC AAGTAGCCTG 6 980 
CAGGGCTTAA CATTTAGAAA CT AGG AG ATT TTAGGCTAGA TGAGGTGCTC ACGCCTGTAA 704 0 
40 TCCCAGCACT TTGGGAGGCC AAGGCAGGCG GATCACCTGA GGTCAGGAAT TCAAG AC CAG 7100 

TCTGGCCAAC ATGGTGAAAC CCGTCTCTAT TAAAAATACA AAAATTAGCC AGCCATGGTG 7160 
GTGCACACCT GTAATCCCAG CTACTTGCGA GGCTGAGGCA GAGAATTGCT TGAACCCGGG 72 20 
AGGTGGAGGT TGCAGTGAGC TGAGATCGCA CCATTGCACT CCAGCCTTGG GTGACAGAGC 7280 

45 

AAGACTGCGT CAAAAAAAAA AAAAAAAAAA AAAAAAAGGA AAGAAAGAGA GAAAGAAAAG 7340 
AAAAGAGAAA AGAAATCAGG AGATTTTACA CT AG C AATTC GGATTTCCAG CTCTGGAAAC 74 00 
ATGAAAAGGT TGAGCCCCAG CGTGCCTCTA AGCATCCCCA AATAGCCACA GAGTGGAGCT 74 60 
50 GGGCAGGGGC CACCCAAGCC AGGCATGTGT CCTCCAGTCT CCAGTTCCCA CCAGCCTATA 7 520 

CTCCTTTGTG CGTGTCTAAG TTTGGGGTCC TTGTGCCTGG TCTTACCCCC CTTAATGTGC 7 580 
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40 



AGAGGGAGGA 


ACCCACGGCC 


CAAGGTCACA 


TGATTGAGTT 


AGTAGCAGAG 


T C AG AGCTGG 


7640 


AACCGGGACG 


CATTTTTGTG 


GGTGCCCTGG 


GTAATTCTCC 


CTGGCCCTTA 


CATTAGTGTC 


7700 


CAGGCCCCGG 


GGACCCCGGC 


CCCGCTCTGG 


GGCAAGGGGT 


CGCATGGCAG 


CCAAAGGCCC 


7760 


CTCCCTGAGA 


GAAGCAAAAG 


GTCAGATGTC 


TCCTTTTCCT 


CTCCCCTTCC 


ACCATCCTCC 


7820 


CCCTGCAGAA 


ACATTCGCCG 


AGGAAGGTAC 


GACCCGACCC 


AGCTGGGGGC 


AGTGTGATGC 


7880 


CGGCCACGTC 


ATCCCTCCCG 


CTGTCCTTGT 


CTCCTCCATC 


TCATTCGTCA 


CCCGCGTTCT 


7940 


GTTGATGGGG 


TGCGGGGCCG 


ATCCCACCCT 


GCGTGCCNNN 


NNNNNNNNNN 


ATCTGTTTTG 


8000 


TCTTCCATAT 


CACCACTGTC 


TGACCTCCCG 


CAGATCCCTT 


CCCTGGCCAG 


CCTGTGACTT 


8060 


GCCGCCTGCC 


TCCAGGGCCC 


AGAACTGAGC 


TCCGGGGCCC 


TGCTGGGGGG 


CTCTCCCCGA 


8120 


GGCCCCTGCT 


CACGTCCTCC 


CCTGATGCCC 


CCTCTCCGTT 


CCAGGTCCGG 


AGGGGCCTAA 


8180 


GTGCCACCCT 


CCTCCCTCCA 


AGCCTGGCAC 


CCACCGTCTC 


GGCCCTGAAC 


CACGAGGCTG 


8240 


CCCCCATCCA 


GCCACGGAGG 


GAGGCACCAT 


GCAAATGTCT 


TCCAGGTCCA 


AACCCTTCAA 


8300 


CTCCTGGCTC 


CGCAGGGGTT 


TGGGTGGGGG 


CTGTGGCCCT 


GCCCTTGGCA 


CCACCAGGGT 


8360 


GGACCAGGCC 


TGGAGGGCAC 


TTCCTCCACA 


GTCCCCCACC 


CACCTCCTGC 


GGCTCAGCCT 


8420 


TGCACACCCA 


CTGCCCCGTG 


TGAATGTAGC 


TTCCACCTCA 


TGGATTGCCA 


CAGCTCAACT 


8480 


CGGGGGCACC 


TGGAGGGATG 


CCCCCAGGCA 


GCCACCAGTG 


GACCTAGCCT 


GGATGGCCCC 


8540 


TCCTTGCAAC 


CAGGCAGCTG 


AGACCAGGGT 


CTTATCTCTC 


TGGGACCTAG 


GGGGACGGGG 


8600 


CTGACATCTA 


CATTTTTTAA 


AACTGAATCT 


TAATCGATGA 


ATGTAAACTC 


GGGGGTGCTG 


8660 


GGGCCAGGGC 


AGATGTGGGG 


ATGTTTTGAC 


ATTTACAGGA 


GGCCCCGGAG 


AAACTGAGGT 


8720 


ATGGCCATGC 


CCTAGACCCT 


CCCCAAGGAT 


GACCACACCC 


GAAGTCCTGT 


CACTGAGCAC 


8780 


AGTCAGGGGC 


TGGGCATCCC 


AGCTTGCCCC 


CGCTTAGCCC 


CGCTGAGCTT 


GGAGGAAGTA 


8840 


TGAGTGCTGA 


TTCAAACCAA 


AGCTGCCTGT 


GCCATGCCCA 


AGGCCTAGGT 


TATGGGTACG 


8900 








C G AAAAC AAC 


CGTCCTGCTG 


TCCCTGTCAG 


8 960 


GACACATGGA 


TTTTGGCAGG 


GCGGGGGGGG 


GTTCTAGAAA 


ATATAGGTTC 


CTATAATAAA 


9020 


ATGGCACCTT 


CCCCCTTTNN 


NNNNNNNNNN 


NNNGGGATAC 


CTCTGAATAT 


GGGTATCTGG 


9080 


GGCTGGATAT 


GGGTGGGACA 


TGAGACTTCC 


TGTGACCAGC 


CACCCTGGCT 


CCCAGCTCTC 


9140 


TGTATCCTCC 


TGCCCCGCCC 


TGGGGGGTGC 


CTACCCTGGN 


AGAACCCAGG 


GAGGAGTGGA 


9200 


GGCTGCCTCT 


GCCTGGGCCT 


CCACACAGCA 


T C C TG AC AT A 


CGCCACCTGG 


GGTGGGGGTG 


9260 


GGGAGGCAGG 


GCCAGGAG 










9278 
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(2) INFORMATION FOR SEQ ID NO: 10: 

50 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



25 



30 



35 



GCACCTGCCC CGGCAGT 17 



(2) INFORMATION FOR SEQ ID NO: 11: 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) ANTI-SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

40 

CCAGGACAGC CCCAGCGATG 2 0 

45 <2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
50 (B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 



GGCTGCTGAT CGCTTCTGCT AC 



(2) INFORMATION FOR SEQ ID NO: 13: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 13 
GAGAAGCTGA ATGTGGAGGG 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) ANTI-SENSE: YES 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GTCAGAGCCG TCCGCCAGC 19 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: ivueleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) ANTI-SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GCCATCCTCC ACATAGCTCA GG 2 2 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



50 (ii) MOLECULE TYPE: DNA (genomic) 
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(iii) ANTI-SENSE: YES 



10 



15 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GATGTAAGTC AAGTTCCCAT CAGAGA 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



25 (ii) MOLECULE TYPE : DNA (genomic) 
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(iii) ANTI-SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AACAGCTGGT GGTCGTTGAT CACAA 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
ATGAGGCTGC TGCGGCGCTG 20 
(2) INFORMATION FOR SEQ ID NO : 19: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

/5 (C) STRANDEDNESS : single 

<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomics 



25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

CACAGATCTG GGGGCATATG CTCCCTG 2 7 



(2) INFORMATION FOR SEQ ID NO: 20: 



(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



45 (iii) ANTI-SENSE: YES 



50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
55 AACAAGCTTC TACTGATGTC TCCCACC 2 7 
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Claims 

1. An MDC protein which comprises the whole or part of the protein represented by SEQ ID NO:1, or 
which consists of a protein substantially equivalent to one comprising the whole or part of the protein 

5 represented by SEQ ID NO: 1. 

2. The MDC protein as claimed in claim 1 , which comprises the whole or part of the protein represented 
by SEQ ID NO:2, or which consists of a protein substantially equivalent to one comprising the whole or 
part of the protein represented by SEQ ID NO:2. 

70 

3. The MDC protein as claimed in claim 1 , which comprises the whole or part of the protein represented 
by SEQ ID NO:3, or which consists of a protein substantially equivalent to one comprising the whole or 
part of the protein represented by SEQ ID NO:3. 

is 4. The MDC protein as claimed in claim 1, which comprises the whole or part of the protein represented 
by SEQ ID NO:4, or which consists of a protein substantially equivalent to one comprising the whole or 
part of the protein represented by SEQ ID NO:4. 

5. The MDC protein as claimed in any of the claims 1 to 4, comprising a polypeptide having an amino 
20 acid sequence consisting of continuous at least 3 to 5 amino acids, at least 8 to 10 amino acids, at 

least 11 to 20 amino acids, or more than 20 amino acids in the sequence represented by the SEQ ID 
NO:1, SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4. 

6. The MDC protein as claimed in any of the claims 1 to 5 comprising a protein substantially equivalent to 
25 one comprising the whole or part of the protein represented by SEQ ID NO:1, SEQ ID NO:2, SEQ ID 

NO:3, or SEQ ID NO:4, wherein one or more amino acids have been replaced, deleted and/or inserted, 
but still producing an equal effect in research and diagnosing. 

7. The MDC protein as claimed in claim 1, which comprises a polypeptide having an amino acid 
30 sequence consisting of continuous, at least eight amino acids in the sequence represented by the SEQ 

IDNO:1. 

8. A DNA encoding the MDC protein as claimed in claim 1, which comprises the whole or part of the DNA 
represented by SEQ ID NO:5, or which consists of a DNA substantially equivalent to one comprising 

35 the whole or part of the DNA represented by SEQ ID NO:5. 

9. The DNA as claimed in claim 8, which comprises the whole or part of the DNA represented by SEQ ID 
NO:6, or which consists of a DNA substantially equivalent to one comprising the whole or part of the 
DNA represented by SEQ ID NO:6. 

40 

10. The DNA as claimed in claim 8, which comprises the whole or part of the DNA represented by SEQ ID 
NO:7, or which consists of a DNA substantially equivalent to one comprising the whole or part of the 
DNA represented by SEQ ID NO:7. 

45 11. The DNA as claimed in claim 8, which comprises the whole or part of the DNA represented by SEQ ID 
NO:8, or which consists of a DNA substantially equivalent to one comprising the whole or part of the 
DNA represented by SEQ ID NO:8. 

12. The DNA as claimed in any of the claims 8 to 11, comprising a DNA sequence consisting of at least 6 
so bases, at least 8 bases, at least 10 to 12 bases or about 15 to 25 bases. 

13. A DNA which comprises the whole or part of the DNA represented by SEQ ID NO:9 including exons 
and introns therein, or which consists of a DNA substantially equivalent to one comprising the whole or 
part of the DNA represented by SEQ ID NO:9 including exons and introns therein. 

55 

14. The DNA as claimed in any of the claims 8 to 13, comprising a DNA sequence substantially equivalent 
to one comprising the whole or part of the DNA represented by SEQ ID NO:5, SEQ ID NO:6, SEQ ID 
NO:7, SEQ ID NO:8 or SEQ ID NO:9, wherein one or more bases have been replaced, deleted and/or 
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inserted, but still producing an equal effect in gene analysis and diagnosis. 
15. A plasmid containing the DNA as claimed in claim 8. 
5 16- A plasmid containing the DNA as claimed in claim 13. 

17. A transformant carrying the plasmid as claimed in claim 15. 

18. A transformant carrying the plasmid as claimed in claim 16. 

w 

19. A process for the production of the MDC protein as claimed in claim 1, which comprises the steps of 
culturing the transformant as claimed in claim 17 and collecting the resulting expression product. 

20. A process for the production of the MDC protein as claimed in claim 1, which comprises the steps of 
75 culturing the transformant as claimed in claim 18 and collecting the resulting expression product. 

21. An antibody combinable to the MDC protein as claimed in claim 1. 

22. A primer or probe which has a DNA sequence, comprising a part of the DNA sequence of the DNA as 
20 claimed in claim 8, or a DNA sequence complementary to a part of the DNA sequence of the DNA as 

claimed in claim 8. 

23. The primer or probe as claimed in claim 22, wherein the part of the DNA sequence consists of at least 
six bases. 

25 

24. A primer or probe which has a DNA sequence, comprising a part of the DNA sequence of the DNA as 
claimed in claim 13, or a DNA sequence complementary to a part of the DNA sequence of the. DNA as 
claimed in claim 13. 

30 25. The primer or probe as claimed in claim 24, wherein the part of the DNA sequence consists of at least 
six bases. 

26. A gene analysis method which comprises the step of hybridizing the primer or probe as claimed in 
claim 22 to a DNA to be tested. 

35 

27. A gene analysis method which comprises the step of hybridizing the primer or probe as claimed in 
claim 24 to a DNA to be tested. 
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(54) MDC proteins and DNAs encoding the same 

(57) The present invention provide a gene present 
in a commonly deleted region of a chromosome in 
breast and ovarian cancers and encoding a novel pro- 
tein, the protein ("MDC protein") encoded by the gene, 
a method for the diagnosis of cancer by using an anti- 
body combinable to the protein, and others. 

A detailed genetic map of human chromosome 17 
was constructed to analyze the chromosome in breast 
and ovarian cancer tissues, and a gene encoding a 
novel protein was cloned and its structure was deter- 
mined. As a result of gene analysis using DNA probes 
derived from the gene, a gene mutation was confirmed 
in breast cancer tissues. Moreover, a transformant car- 
rying a plasmid containing the gene was grown to obtain 
the MDC protein. Furthermore, a monoclonal antibody 
was prepared by using the protein as antigen. 
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